fixes graceful termination when aleth --test is invoked but require…#5146
fixes graceful termination when aleth --test is invoked but require…#5146christianparpart wants to merge 3 commits intoethereum:masterfrom christianparpart:cp-fix-graceful-termination
aleth --test is invoked but require…#5146Conversation
| FD_ZERO(&out); | ||
| FD_ZERO(&err); | ||
| FD_SET(fd, &in); | ||
| struct timeval tv; |
There was a problem hiding this comment.
In a more C++ way this could be timeval const tv = { timeoutMillis / 1000, timeoutMillis % 1000 }
There was a problem hiding this comment.
I was a little conservative. Thx. done! :-)
| return false; | ||
| } | ||
|
|
||
| static bool waitForReadable(int fd, unsigned timeoutMillis) |
There was a problem hiding this comment.
Please move it into unnamed namespace above.
gumb0
left a comment
There was a problem hiding this comment.
Also please add a comment describing why it's needed
Codecov Report
@@ Coverage Diff @@
## develop #5146 +/- ##
==========================================
- Coverage 59.92% 59.7% -0.23%
==========================================
Files 337 337
Lines 27306 27271 -35
Branches 3173 3175 +2
==========================================
- Hits 16362 16281 -81
- Misses 9860 9906 +46
Partials 1084 1084 |
|
Can there be any better approach than calling |
|
hey @gumb0. of course there is, by using However, I didn't wanna be too intrusive for a very first PR in this repo. I can however look into finding a somewhat minimal solution of the latter suggestion I made. |
|
Let's take a step back, can you describe how exactly do you reproduce it? Now when I tried the following steps
|
…d a `SIGKILL` to terminate. This was because the UnixSocketServer requires at least one accepting connection *after* SIGTERM/SIGINT has been received, which usually never is the case. This PR implements a trivial workaround by first waiting for up to N msecs (N=1000 for now) for any input to arrive, and if not, it'll just retry, hence, evaluating m_running, resulting in `aleth --test` to properly terminate upon receival of `SIGINT` or `SIGTERM`.
| FD_ZERO(&err); | ||
| FD_SET(fd, &in); | ||
| timeval tv { timeoutMillis / 1000, timeoutMillis % 1000 }; | ||
| return select(fd + 1, &in, &out, &err, &tv) > 0; |
| // We do this test before calling accept() in order to gracefully exit | ||
| // this function when an external thread has set m_running to true | ||
| // (such as a signal handler in the main thread). | ||
| if (!waitForReadable(m_socket, 500)) |
There was a problem hiding this comment.
There is not way to add the timeout to accept()?
Try without " send some RPC calls". |
|
Still terminates fine without RPC calls. Also tried running soltest and SIGINT after that, also aleth is terminated fine. |
|
I tried it tool and it terminates quickly. Can it be macOS @christianparpart ? |
|
So I was facing this issue on "Windows Subsystem for Linux" (which basically is a Linux kernel & ABI re-implementation within the Windows NT kernel). However, I was trying to reproduce this behavior with a minimal test case, and I believe that I get the same behavior even on Ubuntu 16.04's Linux kernel 4.8.0 with the following test case (link below). So I may still be absolutely brain-dead right now, not finding the bug in my head (or we need this patch) https://gist.github.com/christianparpart/8b89c9dbfba83478d1308ecc32b9753d This code is a minimalistic TCP pong server that shares the same behavior this PR tries to fix within aleth's AF_UNIX listener code. |
|
I digged a little bit into it. it's actually not accept() being cancelled with -1 (EINTR) due to the SIGINT but a -1 (EINVAL) due to the unlink() on the socket. While that one should work on all UNIX'es, I don't think this is a clean solution, as only a side effect of closing a socket while being in a blocking I/O is made use of, which seems unreliable wrt. platform independence (?). I do like your solution (lean, but unfortunately wasn't quite obvious from the beginning), but it doesn't work on my end it seems. |
|
So we had in place a solution I hoped for, it just seems to not work on exotic systems. I would prefer to not merge this (it complicates things quite a bit) and resolving this by you @christianparpart submitting a bug report to Microsoft (is this even a bug?) Otherwise needs some work on better logging / better error reporting / better code formatting. |
|
|
|
I'd adapt the code to conform to coding style and "codecov" once you say I can proceed (:+1:). |
gumb0
left a comment
There was a problem hiding this comment.
Also please apply clang-format to your changes as described in https://github.com/ethereum/cpp-ethereum/blob/master/CONTRIBUTING.md
| return path; | ||
| } | ||
|
|
||
| /** |
There was a problem hiding this comment.
We use only // for multiline comments
| * @retval true One file descriptor became available. | ||
| * @retval false Most likely an error occured, or a timeout. | ||
| */ | ||
| static bool waitForReadable(std::initializer_list<int> sockets) |
There was a problem hiding this comment.
This doesn't need to be declared static as it's in the unnamed namespace
| { | ||
| int dummy = 0x90; | ||
| if (::write(m_exitChannel[1], &dummy, sizeof(dummy)) < 0) | ||
| cerr << "Failed to notify UNIX domain server loop. " << strerror(errno) << "\n"; |
There was a problem hiding this comment.
Better output it to clog(VerbosityWarn, "rpc"), It will not need \n in the end.
| m_exitChannel{-1, -1} | ||
| { | ||
| if (pipe(m_exitChannel) < 0) | ||
| abort(); // I don't like. Replace me with throw. |
There was a problem hiding this comment.
Throw exception here.
Declare it in IpcServerBase.h as
DEV_SIMPLE_EXCEPTION(FailedToCreatePipe);
Then throw here like
BOOST_THROW_EXCEPTION(FailedToCreatePipe());
Fixes graceful termination when
aleth --testis invoked but required aSIGKILLto terminate.This was because the UnixSocketServer requires at least one accepting
connection after SIGTERM/SIGINT has been received, which usually never
is the case.
This PR implements a trivial workaround by first waiting for up to N
msecs (N=1000 for now) for any input to arrive, and if not, it'll just
retry, hence, evaluating m_running, resulting in
aleth --testtoproperly terminate upon receival of
SIGINTorSIGTERM.