Exceptions must be handled by ApplicationCore in a way that the application developer does not have to care much about it.
In case of a ChimeraTK::runtime_error exception the framework must catch the expection and report it to the DeviceModule. The DeviceModule handles this exception and preiodically tries to open the device. In case of several devices only the faulty device is blocked. Even if a device is faulty it should not block the server from starting.
In case of a runtime_error exception the framework must catch the expection and report it to the DeviceModule. The DeviceModule handles this exception and preiodically tries to open the device. In case of several devices only the faulty device is blocked. Even if a device is faulty it should not block the server from starting.
If an input variable is in the error state, it sets the DataValidity flag for its DataValidityProparationExecutor (see \link spec_dataValidityPropagation \endlink) to faulty and the flag is propogated appropriately. After the exception is cleared and operation returns without a data fault flag, set DataValidity flag to ok. Furthermore, the device must be reinitialised automatically and also recover the values of process variables as the device might have rebooted and the variables have been re-set.
...
...
@@ -16,13 +18,15 @@ If an input variable is in the error state, it sets the DataValidity flag for it
- b. An initailisation handler can be added to the DeviceModule in the user code. Initialisation handlers are callback function which will be executed when a device is opened for the first time and after a device recovers from an exception, before any process variables are written.
- c. Initial values must be correctly propogated after a device is opened. See \link spec_initialValuePropagation \endlink. Especially, no read function (even readNonBlocking/readLatest) must return before an initial value has been received.
- d. (removed)
- e. A ChimeraTK::ExceptionHandlingDecorator is placed around all ChimeraTK::NDRegisterAccessors which connect a device to a ChimeraTK::ApplicationModule or fanout. (*)
- e. An ExceptionHandlingDecorator is placed around all NDRegisterAccessors which connect a device to a ApplicationModule or fanout. (*)
- f. (removed)
- g. By default a recovery accessor is added for each device register when it is obtianed. These recovery accessors are used to correctly set the values of variables when the device is opened for the first time and after a device is recovered from an exception. (*)
- h. A ChimeraTK::ExceptionHandlingDecorator for an input knows its DataValidityProparationExecutor, which lives in the ApplicationModule or fanout that reads the input. Like this it can propagate the
- h. An ExceptionHandlingDecorator for an input knows its DataValidityProparationExecutor, which lives in the ApplicationModule or fanout that reads the input. Like this it can propagate the
dataValidity flag. Outputs do not send DataValidity faulty in case of exceptions (see \link spec_dataValidityPropagation \endlink).
- i. Write should not block in case of an exception for the outputs of ThreadedFanOut / TriggerFanOut. (*)
- j. Exception handling and invalid flag propagation has to be implemented such that it is transparent to a module whether it is directly connected to a device, or whether a fanout or another application module is in between.
- k. The server must start even if some devices are in error state. The devices which are working and all modules that do not talk to the broken device must work.
- l. After a device has been re-opened, all values that had once been written must be re-written.
<b>2. The Flow</b>
...
...
@@ -51,7 +55,7 @@ If an input variable is in the error state, it sets the DataValidity flag for it
- 2.4.2.1 If a write is not executed because the device is already faulty (from 2.2 or 2.6.1), the recovery accessor has to take care of this. In this case we always have to send another exception notification to the DeviceModule to make sure that the new recovery value is not missed (avoid race condition). (*)
- 2.5. When a read / write operation on the device (1.e) causes a ChimeraTK::runtime_error exception, the exception is caught in the ExceptionHandlingDecorator
- 2.5. When a read / write operation on the device (1.e) causes a runtime_error exception, the exception is caught in the ExceptionHandlingDecorator
- 2.5.1. If it is a read operation the DataValidityPropagationExecutor is informed that there was a device error. (*)
- 2.5.2. The error is reported to the DeviceModule
- 2.5.3. Action depending on the calling operation :
...
...
@@ -109,9 +113,9 @@ This is required for three reasons:
<b>4. The DeviceModule</b>
Interfaces
Interfaces:
- External interface
- 4.1 External interface
An error status and the last error message are automatically connected to the control system for each device
- /Devices/{AliasName}/message
...
...
@@ -121,8 +125,8 @@ Interfaces
- 4.2 Internal interface to the ExceptionHandlingDecorator
- 4.2.1 A thread safe function ChimeraTK::DeviceModule::reportException() (implements 2.5.2).
- 4.2.2 A blocking way to wait for the device to become available after reporting the exception (implements 2.3.7 and 2.4.1)
- 4.2.1 A thread safe function DeviceModule::reportException() (implements 2.5.2). It does not block but only puts the exception into a lock-free queue.
- 4.2.2 A blocking way to wait for the device to become available after reporting the exception (implements 2.3.7 and 2.4.1) (as a response that report exception has been processed).
- 4.2.3 A shared mutex to prevent read and write operations before the device has been initialised (implementes 1.b, 2.1, 2.3.6 and 2.6.1)
- 4.2.4 A function to add recoveryAccessors (implements 1.g)
- 4.2.5 A shared mutext to protect the covery accessors
...
...
@@ -163,7 +167,7 @@ Comments:
<b>6. TriggerFanout and ThreadedFanOut </b>
- 6.1 TriggerFanout
Each ChimeraTK::TriggerFanOut reads several poll-type variables when a trigger (push type) is received. If one of the poll-type inputs is in error state, it shall not block the other variables.
Each TriggerFanOut reads several poll-type variables when a trigger (push type) is received. If one of the poll-type inputs is in error state, it shall not block the other variables.
To implement this, the TriggerFanout uses the non-blocking write provided by ExceptionHandlingDecorator (see 5.1.1). This is possible because TriggerFanOuts are only connected to device variables. (implements 1.i)
- 6.2 ThreadedFanOut
...
...
@@ -171,66 +175,56 @@ Comments:
<b>7. The server must always start even if a device is in error state.</b>
<b>7. The server must always start even if a device is in error state.</b>
Description.
Implementation of 1.k. This section extracts some points from 1. and 2. to put the bits and pieces into context.
To make sure that the server should always start, the initial opening of the device should take place in the ChimeraTK::DeviceModule::handleException(), which has the exception handling loop so that device can go to the error state right at the beginning and the server can start despite not all its devices are available.
To make sure that the server always starts, even if some or all devices are in error state, the initial opening of the device takes place in the DeviceModule thread (inside the exception handling loop).
The device module reports its status and error messages to the control system (see 2.1, 2.3.5, 2.6.1).
Does not fit here, but is the only place where handleException is mentioned:
- handleException() must not block.
Some initial values are already written in prepare(), before the threads are started. Writing these values must be delayed until the device is available. This is done by the same mechanism that is used to re-write the values after recovery. (see 10 and \link spec_initialValuePropagation \endlink)
Implementation.
<b>8. Propogating the DataValidity flag</b>
- ChimeraTK::DeviceModule::handleException()
If a device is in error state, all it's output data is marked as invalid. This invalid flag shall be propagated through the connected modules such that all data that is calculated from these invalid values is also marked invalid (see \link spec_dataValidityPropagation \endlink). The ExceptionHandlingDecorator is informing the DataValidityPropagationExecutor about the device state (error or ok, see 3.6.3 and 2.4.1.2.1)
To propagate the flag, the first blocking read after the device error return the last value. As the DataValidityPropagationExecutor knows about the device error, the data invalid flag is turned on (2.5.3-read). In order not to prevent unnecessary running of modules with invalid data, the following read call blocks until the device has recovered.
<b>8. Propogate error flag</b>
After recovery the DataValidityPropagationExecutor is informed that the device is OK again, and the received DataValidity of the variable is propagated (usually 'ok', but if 'faulty' is reveived, the data validity stays faulty).
Description.
<b>9. Device initialisation </b>
See 2.5.1.
This partly is specification of the DeviceModule. As it is strongly connected with exception handling, and infect handled by the same code, it is mentioned here.
For initial error propogation see <a href='spec_initialValuePropagation.html'>spec_initialValuePropagation</a>.
- 9.1 The user code can register exception handlers (in the constructor of the DeviceModule or using DeviceModule::addInitialisationHandler). They are executed each time after the device has successfully been opneded (*)
- 9.2 Sometimes it is only possible to write parts of the device after a proper initialisation sequence (for instance reset-registers must be cleared, or comminication clocks to sub-devices must be set). Hence no read or write operations must take place until this point, not even writing recovery accessors (implementes 1.c, implemented by 4.2.3, 5.3.1.2 and 5.3.2.1).
- 9.3 The recovery accessors are written after the initialisation (implements 1.l).
- 9.4 The lock 4.2.3 is only released after all recovery accessors are written, so ApplicationsModules which continue find the same state as before the error when writing or reading.
Implmentation.
- ChimeraTK::ExceptionHandlingDecorator
- ChimeraTK::TriggerFanOut
<b>9. Initialise the device</b>
Description.
Comments:
- 9.1 Successfully opened means open() did not throw, and the device reports isFunctional() as true.
The device should be automatically initialised when opened for first time (2.4.1) and automatically re-initialised after recovery (2.5.3.4).
Implementation.
<b>10. Recover accessors</b>
A list of DeviceModule std::function is added. InitialisationHandlers can be added through construtor and addInitialisationHandler() function. When the device recovers all the initialisationHandlers in the list are executed.
- ChimeraTK::DeviceModule
- ChimeraTK::ExceptionHandlingDecorator
After a device has failed and recovered, it might have re-booted and lost the values of the process variables that live in the server and are written to the device. Hence these values have to be re-written after the device has recovered. The same holds for initial values which have been written before the device thread has started (see 7.), and even normal variables which have been written before the device is available, as several threads start asynchronously.
The writing after the recovery is done in the device thread. The regular register accessors (which are decorated with the ExceptionHandlingDecorator) belong to the ApplicationModule threads (or those of the fanouts), which can modify the user buffer any time. Hence the device thread cannot use these accessors in a thread-safe way. In addition, the device module has to remember the last value which has been written to restore a consisten state. The ApplicationModule might already have modified it's user buffer, but not have written yet. Hence also for logical reasons this buffer cannot be used for recovery.
<b>10. Recover process variables after exception.</b>
As a consequence a copy has to be created whenever the data is written to the device. It is implemented by a so called recovery accessor. This is a regular second accessor to the register whos accessor has been decorated with the ExceptionHandlingDecorator, but with the special usage that the data is set in the Application thread, and written in the DeviceModule thread.
Background.
- 10.1 The recovery accessor is created together with the normal accessor in the connection code (in DeviceModule::writeRecoveryOpen), registered at the DeviceModule and given to the recovery accessors.
After a device has failed and recovered, it might have re-booted and lost the values of the process variables that live in the server and arewritten to the device. Hence these values have to be re-written after the device has recovered.
- 10.2 Data is copied in doPreWrite(), before the original accessor's pre-write is called. This is the last occasion where the data is still guarateed to be in the original accessors's user buffer. The accessor's pre-write might swap the data out, and it might never be available again (in case of write desrictively.
Description.
- 10.3 As the user buffer recovery accessor is written in an AppicationModule or fanout thread, but read in the DeviceModule thread when recovering, it has to be protected by a mutex. For efficiency one single shared mutex is used. All ExceptionHandlingDecorators will accquire a shared lock, as each decorator only touches his own buffer. The DeviceModule, which writes all recovery accessors, uses the unique lock to prevent any ExceptionHandlingDecorator to modify the user buffer while doing so.
Create a copy of accessor when writing the data to the device and use this to recover the values when the device is available again. Recovery accessor do not write if the register is never written before (2.5.3.5.).
- 10.4 All valid recovery accessors are written each time the device has been (re)-opened, after the initialisation handlers have been executed. If a recovery accessor has not seen an initial value yet, the version number is still nullptr, and the accessor is invalid. These accessors are not written. (implements 1.l)
Implementation.
- ChimeraTK::DeviceModule
- ChimeraTK::ExceptionHandlingDecorator
- A list of ChimeraTK::TransferElements is created as ChimeraTK::DeviceModule::writeRecoveryOpen which is populated in function ChimeraTK::DeviceModule::addRecoveryAccessor().
ChimeraTK::ExceptionHandlingDecorator is extended by adding second accessor to the same register as the target accessor it is decorating.
<I> Data is copied in doPreWrite(). [TBD: Do we want this behaviour? => Yes, it has to happen before the original accessor's pre-write because this is the last occasion where the data is still guarateed to be in our user buffer. The accessor's pre-write might swap the data out, and it might never be available again (in case of write desrictively).]</I>
- As the user buffer recovery accessor is written in an AppicationModule or fanout thread, but read in the DeviceModule thread when recovering, it has to be protected by a mutex. For efficiency one single shared mutex is used. All ExceptionHandlingDecorators will accquire a shared lock, as each decorator only touches his own buffer. The DeviceModule, which writes all recovery accessors, uses the unique lock to prevent any ExceptionHandlingDecorator to modify the user buffer while doing so.
<b>11. Known Bugs.</b>
<b>5. Known Bugs.</b>
FIXME: no updated
- Step 2.1 The intial value of deviceError is not set to 1.
...
...
@@ -248,3 +242,4 @@ ChimeraTK::ExceptionHandlingDecorator is extended by adding second accessor to t