Skip to content

fix(inkless:systest): fix sigstop and slow consumer giving false negatives#659

Merged
gqmelo merged 2 commits into
mainfrom
glillo/fix-switch-systests
Jun 23, 2026
Merged

fix(inkless:systest): fix sigstop and slow consumer giving false negatives#659
gqmelo merged 2 commits into
mainfrom
glillo/fix-switch-systests

Conversation

@giuseppelillo

@giuseppelillo giuseppelillo commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

8756bec: avoid false negatives due to slow consumers
332cec1: actually send SIGSTOP to the broker and verify that it really stops

@giuseppelillo giuseppelillo changed the title fix(inkless:systest): fix sigstop and fix(inkless:systest): fix sigstop and slow consumer giving false negatives Jun 17, 2026
@giuseppelillo giuseppelillo force-pushed the glillo/fix-switch-systests branch from d270c07 to efc2498 Compare June 18, 2026 14:47
@giuseppelillo giuseppelillo requested a review from Copilot June 19, 2026 08:40

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens the inkless classic→diskless topic switch system test to reduce false negatives by (1) making the “consume exact count” path resilient to temporarily slow diskless fetch tails and (2) ensuring the SIGSTOP-based leader fault injection actually stops (and resumes) the broker JVM.

Changes:

  • Increase/parameterize console-consumer idle timeout for exact-count reads and adjust completion logic in _consume_all_from_beginning.
  • Fix Trogdor SIGSTOP targeting by using a literal jcmd -l match string for the broker process.
  • Add verification helpers to assert the broker actually enters stopped (ps state T) and later resumes after SIGCONT in the sigstop scenario.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/kafkatest/tests/inkless/inkless_topic_switch_test.py Outdated
Comment thread tests/kafkatest/tests/inkless/inkless_topic_switch_test.py Outdated
The test passed KafkaService.java_class_name() (regex kafka\.Kafka)
to Trogdor's ProcessStopFaultSpec, but Trogdor's worker matches
the target JVM by literal substring against jcmd -l.
The escaped form never matched the real kafka.Kafka line,
so SIGSTOP/SIGCONT were sent to zero pids and the leader was never
actually frozen — the scenario passed without testing anything.

Fix by passing the literal main-class name (kafka.Kafka) so the
signal reaches the broker, and verify the fault actually took
effect: assert the broker JVM reaches ps state T (stopped) during
the pause and returns to running after SIGCONT, so any future
no-op fails loudly instead of silently exercising nothing.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.

Comment thread tests/kafkatest/tests/inkless/inkless_topic_switch_test.py
@gqmelo gqmelo merged commit d69f48a into main Jun 23, 2026
6 checks passed
@gqmelo gqmelo deleted the glillo/fix-switch-systests branch June 23, 2026 11:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants