Skip to content

apc_modbus: simplify error handling with read retries#3414

Open
EchterAgo wants to merge 1 commit intonetworkupstools:masterfrom
EchterAgo:apc_modbus_read_retries
Open

apc_modbus: simplify error handling with read retries#3414
EchterAgo wants to merge 1 commit intonetworkupstools:masterfrom
EchterAgo:apc_modbus_read_retries

Conversation

@EchterAgo
Copy link
Copy Markdown
Contributor

@EchterAgo EchterAgo commented Apr 14, 2026

Replace the complex _apc_modbus_handle_error function with a simpler retry mechanism built into _apc_modbus_read_registers.

The new approach:

  • Retries register reads on ETIMEDOUT errors up to modbus_retries times (configurable, default 3)
  • On non-timeout errors or after retry exhaustion, closes the connection for reconnection on the next update cycle
  • Removes the platform-specific (WIN32/POSIX) timeout detection and the flush-based recovery that didn't work anyway (flush is already done in _apc_modbus_reopen upon reconnection)

Also adds a modbus_retries driver option to configure the number of read retry attempts, and improves logging for connection open/close events.

This change was inspired by a patch by @marcan to do the same and from testing the behaviour of apcupsd, noticing that on my USB unit it times out on read and only succeeds on the first retry.

General points

  • Described the changes in the PR submission or a separate issue, e.g.
    known published or discovered protocols, applicable hardware (expected
    compatible and actually tested/developed against), limitations, etc.

  • There may be multiple commits in the PR, aligned and commented with
    a functional change. Notably, coding style changes better belong in a
    separate PR, but certainly in a dedicated commit to simplify reviews
    of "real" changes in the other commits. Similarly for typo fixes in
    comments or text documents.

  • Use of coding helper tools and AI should be disclosed in the commit
    or PR comments (it is interesting to know which ones do a decent job).
    As with other contributions, a human is responsible and thanked for the
    quality and content of the change, and is presumed to have the right to
    post that code to be published further under the project's license terms.

  • Please star NUT on GitHub, this helps with sponsorships! ;)

Frequent "underwater rocks" for driver addition/update PRs

  • Revised existing driver families and added a sub-driver if applicable
    (nutdrv_qx, usbhid-ups...) or added a brand new driver in the other
    case.

  • Did not extend obsoleted drivers with new hardware support features
    (notably blazer and other single-device family drivers for Qx protocols,
    except the new nutdrv_qx which should cover them all).

  • For updated existing device drivers, bumped the DRIVER_VERSION macro
    or its equivalent.

  • For USB devices (HID or not), revised that the driver uses unique
    VID/PID combinations, or raised discussions when this is not the case
    (several vendors do use same interface chips for unrelated protocols).

  • For new USB devices, built and committed the changes for the
    scripts/upower/95-upower-hid.hwdb file

  • Proposed NUT data mapping is aligned with existing docs/nut-names.txt
    file. If the device exposes useful data points not listed in the file, the
    experimental.* namespace can be used as documented there, and discussion
    should be raised on the NUT Developers mailing list to standardize the new
    concept.

  • Updated data/driver.list.in if applicable (new tested device info)

Frequent "underwater rocks" for general C code PRs

  • Did not "blindly assume" default integer type sizes and value ranges,
    structure layout and alignment in memory, endianness (layout of bytes and
    bits in memory for multi-byte numeric types), or use of generic int where
    language or libraries dictate the use of size_t (or ssize_t sometimes).
  • Progress and errors are handled with upsdebugx(), upslogx(),
    fatalx() and related methods, not with direct printf() or exit().
    Similarly, NUT helpers are used for error-checked memory allocation and
    string operations (except where customized error handling is needed,
    such as unlocking device ports, etc.)

  • Coding style (including whitespace for indentations) follows precedent
    in the code of the file, and examples/guide in docs/developers.txt file.

  • For newly added files, the Makefile.am recipes were updated and the
    make distcheck target passes.

General documentation updates

  • Added a bullet point into NEWS.adoc, possibly also UPGRADING.adoc
    if there is something packagers or custom-build users should take into
    account (new driver categories, configuration options, dependencies...)

  • Updated docs/acknowledgements.txt (for vendor-backed device support)

  • Added or updated manual page information in docs/man/*.txt files
    and corresponding recipe lists in docs/man/Makefile.am for new pages

  • Passed make spellcheck, updated spell-checking dictionary in the
    docs/nut.dict file if needed (did not remove any words -- the make
    rule printout in case of changes suggests how to maintain it).

Additional work may be needed after posting this PR

  • Propose a PR for NUT DDL with detailed device data dumps from tests
    against real hardware (the more models, the better).

  • Address NUT CI farm build failures for the PR: testing on numerous
    platforms and toolkits can expose issues not seen on just one system.

  • Revise suggestions from LGTM.COM analysis about "new issues" with
    the changed codebase.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 14, 2026

A ZIP file with standard source tarball and another tarball with pre-built docs for commit 8ef4491 is temporarily available: NUT-tarballs-PR-3414.zip.

@EchterAgo EchterAgo marked this pull request as draft April 14, 2026 09:14
@EchterAgo
Copy link
Copy Markdown
Contributor Author

I'd like to test this myself some more and am hoping others can test it as well.

@AppVeyorBot
Copy link
Copy Markdown

Build nut 2.8.5.4559-master completed (commit 2ece70fac0 by @EchterAgo)

@jimklimov jimklimov added enhancement APC modbus apcupsd Feature comparison, or possible inspiration from, the sibling project. impacts-release-2.8.5 Issues reported against NUT release 2.8.5 (maybe vanilla or with minor packaging tweaks) labels Apr 14, 2026
@jimklimov jimklimov added this to the 2.8.6 milestone Apr 14, 2026
Replace the complex `_apc_modbus_handle_error` function with a simpler
retry mechanism built into `_apc_modbus_read_registers`.

The new approach:
- Retries register reads on `ETIMEDOUT` errors up to `modbus_retries`
  times (configurable, default 3)
- On non-timeout errors or after retry exhaustion, closes the connection
  for reconnection on the next update cycle
- Removes the platform-specific (WIN32/POSIX) timeout detection and the
  flush-based recovery that didn't work anyway (flush is already done
  in `_apc_modbus_reopen` upon reconnection)

Also adds a `modbus_retries` driver option to configure the number of
read retry attempts, and improves logging for connection open/close
events.

This change was inspired by a patch by @marcan to do the same and from
testing the behaviour of `apcupsd`, noticing that on my USB unit it
times out on read and only succeeds on the first retry.

Signed-off-by: Axel Gembe <axel@gembe.net>
@EchterAgo EchterAgo force-pushed the apc_modbus_read_retries branch from 86bc677 to 8ef4491 Compare April 14, 2026 11:26
@EchterAgo EchterAgo marked this pull request as ready for review April 14, 2026 11:32
@EchterAgo
Copy link
Copy Markdown
Contributor Author

I have encountered a potential issue with Modbus TCP that I want to further investigate.

@EchterAgo EchterAgo marked this pull request as draft April 14, 2026 12:01
@EchterAgo
Copy link
Copy Markdown
Contributor Author

I think our timeout value is marginal for Modbus TCP:

Apr 14 19:10:44 ares nut-driver@ups[2002143]: _apc_modbus_read_registers: Read of 0:27 failed: Connection timed out (192.168.0.102:502)
Apr 14 19:10:44 ares apc_modbus[2002143]: _apc_modbus_read_registers: Read of 0:27 failed: Connection timed out (192.168.0.102:502)
Apr 14 19:10:44 ares nut-driver@ups[2002143]: _apc_modbus_read_registers: Read of 0:27 failed: Invalid data (192.168.0.102:502)
Apr 14 19:10:44 ares nut-driver@ups[2002143]: _apc_modbus_close: Closing connection
Apr 14 19:10:44 ares apc_modbus[2002143]: _apc_modbus_read_registers: Read of 0:27 failed: Invalid data (192.168.0.102:502)
Apr 14 19:10:44 ares apc_modbus[2002143]: _apc_modbus_close: Closing connection
Apr 14 19:10:47 ares nut-driver@ups[2002143]: Opened modbus successfully
Apr 14 19:10:47 ares apc_modbus[2002143]: Opened modbus successfully

capture:

No.	Time	Source	Destination	Protocol	Length	Info
3636	2026-04-14 19:10:43.842072	192.168.0.40	192.168.0.102	Modbus/TCP	66	   Query: Trans: 12622; Unit:   1, Func:   3: Read Holding Registers
3637	2026-04-14 19:10:44.018369	192.168.0.102	192.168.0.40	TCP	60	502 → 40392 [ACK] Seq=70441 Ack=11905 Win=3660 Len=0
3638	2026-04-14 19:10:44.377723	192.168.0.40	192.168.0.102	Modbus/TCP	66	   Query: Trans: 12623; Unit:   1, Func:   3: Read Holding Registers
3639	2026-04-14 19:10:44.578103	192.168.0.102	192.168.0.40	TCP	60	502 → 40392 [ACK] Seq=70441 Ack=11917 Win=3648 Len=0
3640	2026-04-14 19:10:44.579449	192.168.0.102	192.168.0.40	Modbus/TCP	117	Response: Trans: 12622; Unit:   1, Func:   3: Read Holding Registers
3641	2026-04-14 19:10:44.579482	192.168.0.40	192.168.0.102	TCP	54	40392 → 502 [ACK] Seq=11917 Ack=70504 Win=63 Len=0
3642	2026-04-14 19:10:44.579604	192.168.0.40	192.168.0.102	TCP	54	40392 → 502 [FIN, ACK] Seq=11917 Ack=70504 Win=63 Len=0

3636 is the first read, 3638 is the retry after 500ms without response, both are ACKed. 3640 is the actual response to transaction 12622, which arrives 700ms after the initial packet. At this point we expect a response to transaction 12623, which is why we fail with "invalid data".

I think the 500ms timeout is fine for local connections, but I think we should increase it for TCP connections.

@EchterAgo EchterAgo marked this pull request as ready for review April 14, 2026 14:43
@EchterAgo
Copy link
Copy Markdown
Contributor Author

I think this is ready too, the TCP issues are addressed by #3418

@AppVeyorBot
Copy link
Copy Markdown

@AppVeyorBot
Copy link
Copy Markdown

@AppVeyorBot
Copy link
Copy Markdown

Build nut 2.8.5.4593-master completed (commit 31b574a521 by @EchterAgo)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

APC apcupsd Feature comparison, or possible inspiration from, the sibling project. enhancement impacts-release-2.8.5 Issues reported against NUT release 2.8.5 (maybe vanilla or with minor packaging tweaks) modbus

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants