apc_modbus: simplify error handling with read retries#3414
apc_modbus: simplify error handling with read retries#3414EchterAgo wants to merge 1 commit intonetworkupstools:masterfrom
Conversation
|
A ZIP file with standard source tarball and another tarball with pre-built docs for commit 8ef4491 is temporarily available: NUT-tarballs-PR-3414.zip. |
|
I'd like to test this myself some more and am hoping others can test it as well. |
|
✅ Build nut 2.8.5.4559-master completed (commit 2ece70fac0 by @EchterAgo)
|
Replace the complex `_apc_modbus_handle_error` function with a simpler retry mechanism built into `_apc_modbus_read_registers`. The new approach: - Retries register reads on `ETIMEDOUT` errors up to `modbus_retries` times (configurable, default 3) - On non-timeout errors or after retry exhaustion, closes the connection for reconnection on the next update cycle - Removes the platform-specific (WIN32/POSIX) timeout detection and the flush-based recovery that didn't work anyway (flush is already done in `_apc_modbus_reopen` upon reconnection) Also adds a `modbus_retries` driver option to configure the number of read retry attempts, and improves logging for connection open/close events. This change was inspired by a patch by @marcan to do the same and from testing the behaviour of `apcupsd`, noticing that on my USB unit it times out on read and only succeeds on the first retry. Signed-off-by: Axel Gembe <axel@gembe.net>
86bc677 to
8ef4491
Compare
|
I have encountered a potential issue with Modbus TCP that I want to further investigate. |
|
I think our timeout value is marginal for Modbus TCP: capture: 3636 is the first read, 3638 is the retry after 500ms without response, both are ACKed. 3640 is the actual response to transaction 12622, which arrives 700ms after the initial packet. At this point we expect a response to transaction 12623, which is why we fail with "invalid data". I think the 500ms timeout is fine for local connections, but I think we should increase it for TCP connections. |
|
I think this is ready too, the TCP issues are addressed by #3418 |
|
❌ Build nut 2.8.5.4571-master failed (commit 31b574a521 by @EchterAgo) |
|
✅ Build nut 2.8.5.4593-master completed (commit 31b574a521 by @EchterAgo)
|
Replace the complex
_apc_modbus_handle_errorfunction with a simpler retry mechanism built into_apc_modbus_read_registers.The new approach:
ETIMEDOUTerrors up tomodbus_retriestimes (configurable, default 3)_apc_modbus_reopenupon reconnection)Also adds a
modbus_retriesdriver option to configure the number of read retry attempts, and improves logging for connection open/close events.This change was inspired by a patch by @marcan to do the same and from testing the behaviour of
apcupsd, noticing that on my USB unit it times out on read and only succeeds on the first retry.General points
Described the changes in the PR submission or a separate issue, e.g.
known published or discovered protocols, applicable hardware (expected
compatible and actually tested/developed against), limitations, etc.
There may be multiple commits in the PR, aligned and commented with
a functional change. Notably, coding style changes better belong in a
separate PR, but certainly in a dedicated commit to simplify reviews
of "real" changes in the other commits. Similarly for typo fixes in
comments or text documents.
Use of coding helper tools and AI should be disclosed in the commit
or PR comments (it is interesting to know which ones do a decent job).
As with other contributions, a human is responsible and thanked for the
quality and content of the change, and is presumed to have the right to
post that code to be published further under the project's license terms.
Please star NUT on GitHub, this helps with sponsorships! ;)
Frequent "underwater rocks" for driver addition/update PRs
Revised existing driver families and added a sub-driver if applicable
(
nutdrv_qx,usbhid-ups...) or added a brand new driver in the othercase.
Did not extend obsoleted drivers with new hardware support features
(notably
blazerand other single-device family drivers for Qx protocols,except the new
nutdrv_qxwhich should cover them all).For updated existing device drivers, bumped the
DRIVER_VERSIONmacroor its equivalent.
For USB devices (HID or not), revised that the driver uses unique
VID/PID combinations, or raised discussions when this is not the case
(several vendors do use same interface chips for unrelated protocols).
For new USB devices, built and committed the changes for the
scripts/upower/95-upower-hid.hwdbfileProposed NUT data mapping is aligned with existing
docs/nut-names.txtfile. If the device exposes useful data points not listed in the file, the
experimental.*namespace can be used as documented there, and discussionshould be raised on the NUT Developers mailing list to standardize the new
concept.
Updated
data/driver.list.inif applicable (new tested device info)Frequent "underwater rocks" for general C code PRs
structure layout and alignment in memory, endianness (layout of bytes and
bits in memory for multi-byte numeric types), or use of generic
intwherelanguage or libraries dictate the use of
size_t(orssize_tsometimes).Progress and errors are handled with
upsdebugx(),upslogx(),fatalx()and related methods, not with directprintf()orexit().Similarly, NUT helpers are used for error-checked memory allocation and
string operations (except where customized error handling is needed,
such as unlocking device ports, etc.)
Coding style (including whitespace for indentations) follows precedent
in the code of the file, and examples/guide in
docs/developers.txtfile.For newly added files, the
Makefile.amrecipes were updated and themake distchecktarget passes.General documentation updates
Added a bullet point into
NEWS.adoc, possibly alsoUPGRADING.adocif there is something packagers or custom-build users should take into
account (new driver categories, configuration options, dependencies...)
Updated
docs/acknowledgements.txt(for vendor-backed device support)Added or updated manual page information in
docs/man/*.txtfilesand corresponding recipe lists in
docs/man/Makefile.amfor new pagesPassed
make spellcheck, updated spell-checking dictionary in thedocs/nut.dictfile if needed (did not remove any words -- themakerule printout in case of changes suggests how to maintain it).
Additional work may be needed after posting this PR
Propose a PR for NUT DDL with detailed device data dumps from tests
against real hardware (the more models, the better).
Address NUT CI farm build failures for the PR: testing on numerous
platforms and toolkits can expose issues not seen on just one system.
the changed codebase.