diff --git a/doc/exceptionHandlingDesign.dox b/doc/exceptionHandlingDesign.dox new file mode 100644 index 0000000000000000000000000000000000000000..aeea5174e4b73d1551e2dd52edde96abb17021e7 --- /dev/null +++ b/doc/exceptionHandlingDesign.dox @@ -0,0 +1,123 @@ +/** +\page exceptionHandlingDesign Exception Handling Design +\section gen_idea General Idea + + +Exceptions must be handled by ApplicationCore in a way that the application developer does not have to care much about it. + +In case of a ChimeraTK::runtime exception the Application must catch the expection and report it to the DeviceModule. The DeviceModule should handle this exception and block the device until the device can be opened again. As there could many devices make sure only the faulty device is blocked. +Even if a device is faulty it should not block the server from starting. + +Once in error state, set the DataValidity flag for that module to faulty and propogate this to all of it‘s output variables. After the exception is cleared and operation returns without a data fault flag, set DataValidity flag to ok. Furthermore, the device must be reinitialised automatically and also recover the values of process variables as the device might have rebooted and the variables have been re-set. + + +-<b>Add an exception handling and reporting machinsm to the device module (DeviceModule).</b> + +Description. + +Add two error state variables. +- "state" (boolean flag if error occurred) +- "message" (string with error message) +These variables are automatically connected to the control systen in this format +- /Devices/{AliasName}/message +- /Devices/{AliasName}/status + +Add a thread safe function reportException(). +A user/application can report an exception by calling reportException of DeviceModule with an exception string. The reportException packs the exception in a queue and the blocks the thread. This queue is processed by an internal function handleException which updates the DeviceError variables (status=1 and message="YourExceptionString") and tries to open the device. Once device can be opened the DeviceError variables are updated (status=0 and message="") and blocking threads are notified to continue. It must be noted that whatever operation which lead to exception e.g., read or write, should be repeated after the exception is handled. + +Implmentation. +- DeviceModule + + +-<b>Catch ChimeraTK::runtime_error exceptions.</b> + +Description. + +Catch all the ChimeraTK::runtime_error exceptions that could be thrown in read and write operations and feed the error state into the DeviceModule through the function DeviceModule::reportException() . NDRegisterAccessors coming from device should be used as a singal central point to catch these excpetions. +Retry the failed operation after reportException() returns. + +Implmentation. + +It is done by placing a ExceptionHandlingDecorator around all NDRegisterAccessors coming from a device. +- NDRegisterAccessors +- Application + +-<b>Faulty device should not block any other device.</b> + +Description. + +Each TriggerFanOut deals with several variable networks at the same time, which are triggered by the same trigger. Each variable network has its own feeder and one or more consumers. You do not need to change anything about the variable networks. +On the other hand, the trigger itself is a variable network, too. The TriggerFanOut has a consumer of this trigger network. This is the accessor on which the blocking read() is called in the loop. You will need to create additional consumers in the trigger network, one for each TriggerFanOut. + +Implementation. + +- Application (Application::typedMakeConnection) + +-<b>The Server must always start even if a device is in error state.</b> + +Description. + +To make sure that the server should always start, the initial opening of the device should take place in the DeviceModule itself, inside the exception handling loop so that device can go to the error state right at the beginning and the server can start despite not all its devices are available. + +Implementation. + +- DeviceModule ( DeviceModule::handleException() ). + + +-<b>Set/clear fault flag of module in case of exception.</b> + +Background. + +A DataValidity flag of a module is set to faulty if any input variables returns with a set data fault flag after a read operation and is cleared once all inputs have data fault no longer set. In a write operation, the module's data fault flag status is attached to the variable to write. +More detail ...(Martin‘s doc) + +Description. + +In case of an ChimeraTK:runtime_error exception this DataValidity flag should also be set to faulty and propogated to all outputs of the module. When the operation completes after clearing the exception state, the flag should be cleared as well. + +Implmentation. +- ExceptionHandlingDecorator +- TriggerFanOut + +Additional note from code author. +Note that if the data is distributed through a triggered FanOut (i.e. variables from device is connected to other variables through a trigger, the usual way for poll-type variables) the data read from the receiving end of the variable cannot be considered valid if the DataValidity is faulty. +Additionaly, a change of to a faulty validity state will signal the availability of new data on those variables, which is to be considered invalid. + + +Bahnhof.Variables which are Constants or outputs of the ConfigReader and are connected to a DeviceModule should be written in an initialisation handler. Currently they are written in ConfigReader::pepare() etc., which might block the application initialisation if an exception occurs in the process of writing these variables. + +-<b>Initialise the device after recovey.</b> + +Description. + +If a device is recovered after an exception, it might need to be reinitialised (e.g. because it was power cycled). The device should be automatically reinitialised after recovery. + +Implementation. + +A list of DeviceModule std::function is added. InitialisationHandlers can be added through construtor and addInitialisationHandler() function. When the device recovers all the initialisationHandlers in the list are executed. +- DeviceModule + + +-<b>Recover process variables after exception.</b> + +Background. + +After a device has failed and recovered, it might have re-booted and lost the values of the process variables that live in the server and are written to the device. Hence these values have to be re-written after the device has recovered. + +Description. +Technically the issue is that the original value that has been written is not safely accessible when recovering. Inside the accessor the user buffer must not be touched because the recovery is taking place in a different thread. In addition we don't know where the data is (might or might not have been swapped away, depending whether write() or writeDestructively() has been call by the user). +The only race condition free way is to create a copy when writing the data to the device, so they are available when recovering. + +Implementation. + +- DeviceModule +- ExceptionHandlingDecorator +A list of TransferElements shared pointers is created with as writeRecoveryOpen which is populated in function addRecoveryAccessor in the DeviceModule. +ExceptionHandlingDecorator is extended by adding second accessor to the same register as the target accessor it is decorating and data is copied in doPreWrite(). + + + + + +*/ + diff --git a/doc/main.dox b/doc/main.dox index d19c446a662d8eb673c4f54899eaa6365904b438..4643859a56a4afde4d7682151187e27c001e9e12 100644 --- a/doc/main.dox +++ b/doc/main.dox @@ -3,6 +3,7 @@ API documentation: - \subpage exceptionHandling +- \subpage exceptionHandlingDesign Module documentation: - \subpage loggingdoc diff --git a/tests/executables_src/testDeviceExceptionFlagPropagation.cc b/tests/executables_src/testDeviceExceptionFlagPropagation.cc index 4fc907902fd781bb57cfe9d54751f4ea9f8d13ac..56dabcf6bf4515f71dda7cecae4fc7764fd555c8 100644 --- a/tests/executables_src/testDeviceExceptionFlagPropagation.cc +++ b/tests/executables_src/testDeviceExceptionFlagPropagation.cc @@ -4,12 +4,12 @@ using namespace boost::unit_test_framework; #include <ChimeraTK/DummyRegisterAccessor.h> +#include <ChimeraTK/ExceptionDevice.h> #include "Application.h" #include "ApplicationModule.h" #include "ControlSystemModule.h" #include "DeviceModule.h" -#include "ExceptionDevice.h" #include "PeriodicTrigger.h" #include "TestFacility.h" #include "VariableGroup.h" diff --git a/tests/executables_src/testDeviceInitialisationHandler.cc b/tests/executables_src/testDeviceInitialisationHandler.cc index 658bc66dea44e2c085b2b43524559d31c9502ab2..fbd4336c3e14a3c5c4a4951ebbb5d6a71a1c10e9 100644 --- a/tests/executables_src/testDeviceInitialisationHandler.cc +++ b/tests/executables_src/testDeviceInitialisationHandler.cc @@ -10,9 +10,9 @@ #include "DeviceModule.h" //#include "ScalarAccessor.h" #include "TestFacility.h" -#include "ExceptionDevice.h" #include <ChimeraTK/Device.h> +#include <ChimeraTK/ExceptionDevice.h> #include <stdlib.h> #include "check_timeout.h" diff --git a/tests/executables_src/testExceptionDummyDevice.cc b/tests/executables_src/testExceptionDummyDevice.cc deleted file mode 100644 index fd4231e6c50a9665b21fceeb06c317325118df9c..0000000000000000000000000000000000000000 --- a/tests/executables_src/testExceptionDummyDevice.cc +++ /dev/null @@ -1,56 +0,0 @@ -#define BOOST_TEST_MODULE testExceptionsDummy - -#include "ExceptionDevice.h" -#include <ChimeraTK/BackendFactory.h> -#include <ChimeraTK/Device.h> -#include <boost/test/included/unit_test.hpp> - -using namespace boost::unit_test_framework; -namespace ctk = ChimeraTK; - -auto exceptionDummy = boost::dynamic_pointer_cast<ExceptionDummy>(ctk::BackendFactory::getInstance().createBackend("(ExceptionDummy:1?map=test3.map)")); -ctk::Device device; - -BOOST_AUTO_TEST_CASE(testExceptionsDummyDevice) { - - // test general function - BOOST_CHECK(!device.isFunctional()); - device.open("(ExceptionDummy:1?map=test3.map)"); - BOOST_CHECK(device.isFunctional()); - - - // test throwExceptionRead - exceptionDummy->throwExceptionRead = true; - BOOST_CHECK(!device.isFunctional()); - BOOST_CHECK_THROW(device.read<int32_t>("/Integers/signed32"),ChimeraTK::runtime_error); - BOOST_CHECK(!device.isFunctional()); - BOOST_CHECK_THROW(device.open("(ExceptionDummy:1?map=test3.map)"),ChimeraTK::runtime_error); - BOOST_CHECK(!device.isFunctional()); - exceptionDummy->throwExceptionRead = false; - BOOST_CHECK(!device.isFunctional()); - device.open("(ExceptionDummy:1?map=test3.map)"); - BOOST_CHECK(device.isFunctional()); - - // test throwExceptionWrite - exceptionDummy->throwExceptionWrite = true; - BOOST_CHECK(!device.isFunctional()); - BOOST_CHECK_THROW(device.write<int32_t>("/Integers/signed32",0),ChimeraTK::runtime_error); - BOOST_CHECK(!device.isFunctional()); - BOOST_CHECK_THROW(device.open("(ExceptionDummy:1?map=test3.map)"),ChimeraTK::runtime_error); - BOOST_CHECK(!device.isFunctional()); - exceptionDummy->throwExceptionWrite = false; - BOOST_CHECK(!device.isFunctional()); - device.open("(ExceptionDummy:1?map=test3.map)"); - BOOST_CHECK(device.isFunctional()); - - // test throwExceptionOpen - exceptionDummy->throwExceptionOpen = true; - BOOST_CHECK(!device.isFunctional()); - BOOST_CHECK_THROW(device.open("(ExceptionDummy:1?map=test3.map)"),ChimeraTK::runtime_error); - BOOST_CHECK(!device.isFunctional()); - exceptionDummy->throwExceptionOpen = false; - BOOST_CHECK(!device.isFunctional()); - device.open("(ExceptionDummy:1?map=test3.map)"); - BOOST_CHECK(device.isFunctional()); - -} diff --git a/tests/executables_src/testExceptionHandling.cc b/tests/executables_src/testExceptionHandling.cc index 051c8179f6db17c60490d33a96ede82deced143b..78157d16bf5b46b841fdae357da36c74fb6c2715 100644 --- a/tests/executables_src/testExceptionHandling.cc +++ b/tests/executables_src/testExceptionHandling.cc @@ -9,12 +9,12 @@ #include <ChimeraTK/Device.h> #include <ChimeraTK/NDRegisterAccessor.h> #include <ChimeraTK/DummyRegisterAccessor.h> +#include <ChimeraTK/ExceptionDevice.h> #include "Application.h" #include "ApplicationModule.h" #include "ControlSystemModule.h" #include "DeviceModule.h" -#include "ExceptionDevice.h" #include "ScalarAccessor.h" #include "TestFacility.h" #include "check_timeout.h" diff --git a/tests/executables_src/testProcessVariableRecovery.cc b/tests/executables_src/testProcessVariableRecovery.cc index 2b5aef1a344ea7fc8d314f9ac6ca34c43d691221..42b91d7071eef08b2107d08ad6a89726c94fb03d 100644 --- a/tests/executables_src/testProcessVariableRecovery.cc +++ b/tests/executables_src/testProcessVariableRecovery.cc @@ -5,16 +5,17 @@ #include "ControlSystemModule.h" #include "DeviceModule.h" #include "TestFacility.h" -#include "ExceptionDevice.h" -#include <ChimeraTK/Device.h> -#include <stdlib.h> #include "check_timeout.h" #include "ApplicationModule.h" #include "ArrayAccessor.h" #include "ConfigReader.h" +#include <ChimeraTK/ExceptionDevice.h> +#include <ChimeraTK/Device.h> +#include <stdlib.h> #include <regex> + using namespace boost::unit_test_framework; namespace ctk = ChimeraTK; diff --git a/tests/executables_src/testPropagateDataFaultFlag.cc b/tests/executables_src/testPropagateDataFaultFlag.cc index 13285b77b25ef79453487d47ff18bd63083e7c76..3a25a444baf91d898403c099a2452233fda3adc3 100644 --- a/tests/executables_src/testPropagateDataFaultFlag.cc +++ b/tests/executables_src/testPropagateDataFaultFlag.cc @@ -16,10 +16,11 @@ #include "ScalarAccessor.h" #include "ArrayAccessor.h" #include "TestFacility.h" -#include "ExceptionDevice.h" #include "ModuleGroup.h" #include "check_timeout.h" +#include <ChimeraTK/ExceptionDevice.h> + using namespace boost::unit_test_framework; namespace ctk = ChimeraTK; diff --git a/tests/include/ExceptionDevice.h b/tests/include/ExceptionDevice.h deleted file mode 100644 index 359a2e8fb2b601f578aa6b036bc060c52e569f74..0000000000000000000000000000000000000000 --- a/tests/include/ExceptionDevice.h +++ /dev/null @@ -1,64 +0,0 @@ -#include <ChimeraTK/BackendFactory.h> -#include <ChimeraTK/DeviceAccessVersion.h> -#include <ChimeraTK/DummyBackend.h> - -class ExceptionDummy : public ChimeraTK::DummyBackend { - public: - ExceptionDummy(std::string mapFileName) : DummyBackend(mapFileName) {} - bool throwExceptionOpen{false}; - bool throwExceptionRead{false}; - bool throwExceptionWrite{false}; - bool thereHaveBeenExceptions{false}; - - static boost::shared_ptr<DeviceBackend> createInstance(std::string, std::map<std::string, std::string> parameters) { - return boost::shared_ptr<DeviceBackend>(new ExceptionDummy(parameters["map"])); - } - - void open() override { - if(throwExceptionOpen) { - thereHaveBeenExceptions = true; - throw(ChimeraTK::runtime_error("DummyException: This is a test")); - } - ChimeraTK::DummyBackend::open(); - if(throwExceptionRead || throwExceptionWrite) { - thereHaveBeenExceptions = true; - throw(ChimeraTK::runtime_error("DummyException: open throws because of device error when already open.")); - } - thereHaveBeenExceptions = false; - } - - void read(uint8_t bar, uint32_t address, int32_t* data, size_t sizeInBytes) override { - if(throwExceptionRead) { - thereHaveBeenExceptions = true; - throw(ChimeraTK::runtime_error("DummyException: read throws by request")); - } - ChimeraTK::DummyBackend::read(bar, address, data, sizeInBytes); - } - - void write(uint8_t bar, uint32_t address, int32_t const* data, size_t sizeInBytes) override { - if(throwExceptionWrite) { - thereHaveBeenExceptions = true; - throw(ChimeraTK::runtime_error("DummyException: write throws by request")); - } - ChimeraTK::DummyBackend::write(bar, address, data, sizeInBytes); - } - - bool isFunctional() const override { - return (_opened && !throwExceptionOpen && !throwExceptionRead && !throwExceptionWrite && !thereHaveBeenExceptions); - } - - class BackendRegisterer { - public: - BackendRegisterer(); - }; - static BackendRegisterer backendRegisterer; -}; - -ExceptionDummy::BackendRegisterer ExceptionDummy::backendRegisterer; - -ExceptionDummy::BackendRegisterer::BackendRegisterer() { - std::cout << "ExceptionDummy::BackendRegisterer: registering backend type " - "ExceptionDummy" - << std::endl; - ChimeraTK::BackendFactory::getInstance().registerBackendType("ExceptionDummy", &ExceptionDummy::createInstance); -}