Skip to content

Implement Sending Metrics via CMDP

Simon Spannagel requested to merge p-metrics into main

This MR implements a metrics manager which allows any code on the framework to send metrics information via CMDP.

There are two types of metrics available:

  • "Timed metric" is registered with a timeout interval. No matter how often the metric has been updated in the meantime, a message is only sent after this minimal interval has passed. If the value goes unchanged, no message is sent.
  • "Triggered metric" is registered with a number of triggers. Only after the setMetrics function has been called N times, this metric emits a message - in case its value has changed. This is useful e.g. for "send every 100th event" for DQM.

Metrics are only send at all if they have changed compared to the last message sent.

The interface in a satellite looks quite neat:

prototype::prototype(std::string_view type_name, std::string_view satellite_name) : Satellite(type_name, satellite_name) {
    register_timed_metric("CPULOAD", 3s, metrics::Type::AVERAGE);
}

void prototype::running(const std::stop_token& stop_token) {
    while(!stop_token.stop_requested()) {
        set_metric("CPULOAD", random());
    }
}

This is still a draft but I would like some first review and ideas for the open things:

  • Some cleanup: have a look at CMDP1Message, I need to be able to set the payload later - but this implies that we could end up attempting to send a message without payload.
  • Revisiting the CMDP protocol, I would drop the description and unit. This is something that goes into the documentation and has to be configured on the receiving side I would say. Description definitively, unit we could discuss. Needs protocol change, see !148 (merged)
  • I have no clue how to pass the ZMQ publisher to the metrics manager right now. Ideas welcome.
  • Is the ZMQ socket thread-safe, i.e. can we even hammer it from two endpoints? Looks like it is not according to the ZMQ Guide
  • Is the waiting-and-locking scheme alright? I somehow got lost in the middle of predicates and wait_until. Ist seems to work for me 😄
  • Better error handling / exceptions for already registered metrics, for missing metrics, ...
  • Allow tying metrics to specific FSM states similar to what !128 (merged) can do
Edited by Simon Spannagel

Merge request reports