Skip to content

Improvement iteration for under- and new overcurrent motor failure detection#26262

Merged
MaEtUgR merged 5 commits intomainfrom
maetugr/motor-failure-detection
Feb 9, 2026
Merged

Improvement iteration for under- and new overcurrent motor failure detection#26262
MaEtUgR merged 5 commits intomainfrom
maetugr/motor-failure-detection

Conversation

@MaEtUgR
Copy link
Copy Markdown
Member

@MaEtUgR MaEtUgR commented Jan 13, 2026

Solved Problem

The motor failure detection based on current was already supported in PX4 since #19570 but has some real world limitations e.g.:

  • If the ESCs are not from zero to count the logic would break at multiple points and report weird ESC offline 156 messages. This shows quickly when bench testing with only a subset of UAVCAN motors attached.
  • Often there's an offset in the formula to get the expected steady state current from the commanded thrust (graph doesn't go exactly through zero).
  • There's no upper limit for the current so if a rotor gets stuck or otherwise shows too high load it's not considered.
  • The detection is enabled by default so whenever someone sets up a vehicle with telemetry he gets reports of motor failure even though the defaults are not necessarily sensible for a real-world setup.

Solution

This is only my first iteration of improving the logic.
I'll follow up with more iterations to:

  • clean up the configuration e.g. FD_ACT_MOT_THR might have become obsolete
  • merge offline check with the one escCheck providing user feedback and UAVCAN driver
  • probably move this out of the failure detector 🤔
  • more functionality to check on e.g. temperature and refine based on real world testing and more data

Changelog Entry

Improvement iteration for under- and new overcurrent motor failure detection

Test coverage

Only bench testing and comparison to real-world flight log data so far.

Context

I made this current check implementation based on real world data analysis which looked like this. Upper bound yellow (imagine a line), lower bound red (imagine a line), blue points are actual data points of current at a certain command during the flight.
527636242-5a638754-35f8-4382-951b-112e969202f5

@MaEtUgR MaEtUgR self-assigned this Jan 13, 2026
@github-actions
Copy link
Copy Markdown

github-actions bot commented Jan 13, 2026

🔎 FLASH Analysis

px4_fmu-v5x [Total VM Diff: 168 byte (0.01 %)]
    FILE SIZE        VM SIZE    
--------------  -------------- 
+0.0%    +168  +0.0%    +168    .text
  +0.1%    +116  +0.1%    +116    g_cromfs_image
  [NEW]     +40  [NEW]     +40    CSWTCH.34
  +6.8%     +40  +6.8%     +40    FailureDetector::FailureDetector()
  +0.0%     +40  +0.0%     +40    [section .text]
   +13%     +24   +13%     +24    FailureDetector::updateParamsImpl()
  +0.1%     +16  +0.1%     +16    px4::parameters
  [NEW]      +4  [NEW]      +4    CSWTCH.848
  +1.4%      +2  +1.4%      +2    Commander::updateTunes()
  -2.9%      -2  -2.9%      -2    Commander::checkWorkerThread()
  [DEL]      -4  [DEL]      -4    CSWTCH.844
  -0.7%      -4  -0.7%      -4    EscChecks::checkEscStatus()
  -1.0%      -4  -1.0%      -4    FailureDetector::updateImbalancedPropStatus()
  -4.5%      -4  -4.5%      -4    FlightTask
  -5.0%      -8  -5.0%      -8    FailureDetector::publishStatus()
  [DEL]     -40  [DEL]     -40    CSWTCH.30
  -8.8%     -48  -8.8%     -48    FailureDetector::updateMotorStatus()
+0.0%    +233  [ = ]       0    .debug_abbrev
+0.0%     +48  [ = ]       0    .debug_frame
+0.0% +6.37Ki  [ = ]       0    .debug_info
+0.0%    +311  [ = ]       0    .debug_line
  +100%      +2  [ = ]       0    [Unmapped]
  +0.0%    +309  [ = ]       0    [section .debug_line]
-0.0%     -97  [ = ]       0    .debug_loclists
-0.0%     -33  [ = ]       0    .debug_rnglists
 -66.7%      -2  [ = ]       0    [Unmapped]
  -0.0%     -31  [ = ]       0    [section .debug_rnglists]
+0.0% +1.74Ki  [ = ]       0    .debug_str
-2.0%    -168  [ = ]       0    [Unmapped]
+0.0% +8.56Ki  +0.0%    +168    TOTAL

px4_fmu-v6x [Total VM Diff: 72 byte (0 %)]
    FILE SIZE        VM SIZE    
--------------  -------------- 
+0.0%     +72  +0.0%     +72    .text
  +0.1%    +128  +0.1%    +128    g_cromfs_image
  +0.0%     +44  +0.0%     +44    [section .text]
  [NEW]     +40  [NEW]     +40    CSWTCH.34
  +6.8%     +40  +6.8%     +40    FailureDetector::FailureDetector()
   +13%     +24   +13%     +24    FailureDetector::updateParamsImpl()
  +0.1%     +16  +0.1%     +16    px4::parameters
  [NEW]      +4  [NEW]      +4    CSWTCH.848
   +44%      +4   +44%      +4    g_nullstring
  [DEL]      -4  [DEL]      -4    CSWTCH.844
  -5.6%      -4  -5.6%      -4    ConstLayer::containedAsBitset()
 -25.0%      -4 -25.0%      -4    ConstLayer::contains()
 -14.3%      -4 -14.3%      -4    ConstLayer::get()
  -0.7%      -4  -0.7%      -4    EscChecks::checkEscStatus()
  -1.0%      -4  -1.0%      -4    FailureDetector::updateImbalancedPropStatus()
  -2.5%      -4  -2.5%      -4    param_find_internal()
  -5.0%      -8  -5.0%      -8    FailureDetector::publishStatus()
  -2.6%      -8  -2.6%      -8    param_hash_check
  -4.5%     -20  -4.5%     -20    param_reset_specific
  [DEL]     -40  [DEL]     -40    CSWTCH.30
  -8.8%     -48  -8.8%     -48    FailureDetector::updateMotorStatus()
 -101.9%     -76 -101.9%     -76    [21 Others]
+0.0%    +233  [ = ]       0    .debug_abbrev
+0.0%     +28  [ = ]       0    .debug_frame
+0.0% +6.24Ki  [ = ]       0    .debug_info
+0.0%    +213  [ = ]       0    .debug_line
-0.0%    -168  [ = ]       0    .debug_loclists
-0.0%     -42  [ = ]       0    .debug_rnglists
  [NEW]      +2  [ = ]       0    [Unmapped]
  -0.0%     -44  [ = ]       0    [section .debug_rnglists]
+0.0% +1.74Ki  [ = ]       0    .debug_str
-1.2%     -72  [ = ]       0    [Unmapped]
+0.0% +8.24Ki  +0.0%     +72    TOTAL

Updated: 2026-02-09T12:47:04

@dakejahl
Copy link
Copy Markdown
Contributor

If the ESCs are not from zero to count the logic would break at multiple points and report weird ESC offline 156 messages.

Addressed in #26263
https://github.com/PX4/PX4-Autopilot/pull/26263/changes#diff-b1bcce3805e298f2466d3c57682545f35110056fef391668cf5d76f7803d5312

dakejahl
dakejahl previously approved these changes Jan 14, 2026
Copy link
Copy Markdown
Contributor

@dakejahl dakejahl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments but otherwise LGTM

Comment thread src/modules/commander/failure_detector/failure_detector_params.c
Comment thread src/modules/commander/failure_detector/failure_detector_params.c
Comment thread src/modules/commander/failure_detector/FailureDetector.hpp
esc_fail_msg[sizeof(esc_fail_msg) - 1] = '\0';
}
for (int i = 0; i < esc_status_s::CONNECTED_ESC_MAX; ++i) {
const bool mapped = math::isInRange(esc_status.esc[i].actuator_function, actuator_motors_s::ACTUATOR_FUNCTION_MOTOR1,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const bool mapped = math::isInRange(esc_status.esc[i].actuator_function, actuator_motors_s::ACTUATOR_FUNCTION_MOTOR1,
const bool is_motor = math::isInRange(esc_status.esc[i].actuator_function, actuator_motors_s::ACTUATOR_FUNCTION_MOTOR1,

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is mapped as motor? I'll check 🤔 👍

Copy link
Copy Markdown
Member Author

@MaEtUgR MaEtUgR Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you look at the check and see it checks for being mapped to a motor and I looked at what's interesting, is it mapped at all. Both are valid. I think we should combine tha names in the next iteration, I want to unblock @ttechnick 's work.

}

for (int index = 0; index < esc_status.esc_count; index++) {
for (int index = 0; index < math::min(esc_status.esc_count, esc_status_s::CONNECTED_ESC_MAX); ++index) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for (int index = 0; index < math::min(esc_status.esc_count, esc_status_s::CONNECTED_ESC_MAX); ++index) {
for (int index = 0; index < esc_status_s::CONNECTED_ESC_MAX; ++index) {

you should still iterate over esc_status_s::CONNECTED_ESC_MAX. Technically you could configure Motor1 and Motor2 on actuator channels 1&2 and Motor3 and Motor4 on channels 5&6. The EscStatus.esc[8] message is in actuator order.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we keep it the proposed way, as it is more conservative and change it to your suggested way in the next iteration?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to document more clearly how the array and esc_count works in the esc_status message. This implementation is based on the assumption the count gives the number of valid array entries from the beginning. I thought that's currently also the case but we need to check the details.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EscStatus.esc_count is the total number of valid entries but they are not necessarily sequential. The EscReport[8] esc is in actuator order not motor order, hence the EscReport.actuator_function

Comment thread src/modules/commander/failure_detector/FailureDetector.cpp Outdated
Comment thread src/modules/commander/failure_detector/FailureDetector.cpp
Comment thread src/modules/commander/failure_detector/FailureDetector.cpp
@ttechnick
Copy link
Copy Markdown
Member

The buffer overflow issue should be addressed here, but the timeout can also be fixed in a later iteration :)

Comment thread src/modules/commander/failure_detector/FailureDetector.cpp Outdated
MaEtUgR and others added 5 commits February 9, 2026 13:38
previous to this
09d79b2
set `esc_online_flags` e.g. for UAVCAN ESCs which specific one is online and that then got compared to a mask where the first `esc_count` bits were set.

So if only ESC 5 is mapped and online you get the message "ESC 156 offline" because `esc_online_flags = 0b1000` gets compared to `online_bitmask = 0b1` based on `esc_count = 1` and the motor index is `esc[0].actuator_function = 0` wrapped using `0 - actuator_motors_s::ACTUATOR_FUNCTION_MOTOR1 + 1 = 156`.
@ttechnick ttechnick force-pushed the maetugr/motor-failure-detection branch from 71e531b to 612a160 Compare February 9, 2026 12:38
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud bot commented Feb 9, 2026

❌ The last analysis has failed.

See analysis details on SonarQube Cloud

@ttechnick ttechnick marked this pull request as ready for review February 9, 2026 15:27
}

for (int index = 0; index < esc_status.esc_count; index++) {
for (int index = 0; index < math::min(esc_status.esc_count, esc_status_s::CONNECTED_ESC_MAX); ++index) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we keep it the proposed way, as it is more conservative and change it to your suggested way in the next iteration?

@MaEtUgR
Copy link
Copy Markdown
Member Author

MaEtUgR commented Feb 9, 2026

Thanks for the reviews! They were direly necessary (array overflow 🙈). I think by now we can merge this iteration to allow progressing to the next one: #26420

@MaEtUgR MaEtUgR merged commit dbb00d5 into main Feb 9, 2026
76 checks passed
@MaEtUgR MaEtUgR deleted the maetugr/motor-failure-detection branch February 9, 2026 17:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants