Skip to content

refactor: add NodeBuilder to simplify container.Node creation#198

Open
chatton wants to merge 3 commits into
mainfrom
chatton/simplify-container-node-creation
Open

refactor: add NodeBuilder to simplify container.Node creation#198
chatton wants to merge 3 commits into
mainfrom
chatton/simplify-container-node-creation

Conversation

@chatton
Copy link
Copy Markdown
Collaborator

@chatton chatton commented May 21, 2026

Summary

  • Adds a fluent NodeBuilder API to container package that handles Node + Lifecycle + Volume setup in a single Build() call
  • Converts all 12 internal callers from the 3-step pattern to the builder
  • Old NewNode, SetContainerLifecycle, and CreateAndSetupVolume marked deprecated but preserved for backwards compatibility
  • Builder supports WithVolumeName (for shared volumes) and WithHostNetwork (for host networking)
  • Extracts nodeName helpers in each package to decouple name computation from the embedded Node

Closes #131

Summary by CodeRabbit

  • Refactor

    • Switched many container/node constructions to a new builder-based initialization for more consistent setup.
    • Centralized container naming into shared helpers for stable, uniform names.
    • Marked legacy direct constructors as deprecated in favor of the builder approach.
  • Chores

    • Added lint-suppression annotations in tests for deprecated setup APIs during the transition.

Review Change Stack

Replace the 3-step container.Node setup (NewNode + SetContainerLifecycle +
CreateAndSetupVolume) with a fluent NodeBuilder API. Old functions are
preserved but marked deprecated for backwards compatibility.
@chatton chatton requested a review from tty47 as a code owner May 21, 2026 09:01
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 21, 2026

📝 Walkthrough

Walkthrough

Adds a fluent NodeBuilder API and migrates many Node/agent/relayer constructors to use it, centralizes container naming helpers, suppresses staticcheck in tests that still use deprecated constructors, and improves docker keyring exec/cleanup behavior.

Changes

Container Node Builder Refactoring

Layer / File(s) Summary
NodeBuilder Foundation
framework/docker/container/node.go
NodeBuilder fluent API with chainable configuration methods and Build that initializes Node, lifecycle, optional host network, resolves volume name, and creates/sets up the volume. NewNode and CreateAndSetupVolume are annotated with deprecation comments.
Cosmos Chain Node Migration
framework/docker/cosmos/chain_builder.go, framework/docker/cosmos/node.go
Inline ChainNode construction using NodeBuilder, default container home-dir logic, derive effective nodeType when unset, wrap built container node into ChainNode, conditional genesis keyring preload for validators, and extract chainNodeName helper; NewChainNode marked deprecated.
Data Availability Network Stack
framework/docker/dataavailability/network_builder.go, framework/docker/dataavailability/node.go
NetworkBuilder.newNode and DA Node now use NodeBuilder with homeDir fallback and wrap the built container Node into the DA Node struct. Add daNodeName helper and deprecate old NewNode.
EVM Stack Variants Migration
framework/docker/evstack/...
Migrate evstack aggregator and variant nodes (evmsingle, reth, spamoor, evstack root) to NodeBuilder construction, extract helpers (evstackNodeName, evmSingleNodeName, rethNodeName, spamoorNodeName), and update Name() methods.
Hyperlane Services Migration
framework/docker/hyperlane/*
Migrate agent, forward relayer, and deployer to NodeBuilder construction, add naming helpers (agentNodeName, forwardRelayerNodeName, deployerNodeName), and return constructed nodes without separate lifecycle/volume setup.
IBC Relayer & Monitoring Nodes
framework/docker/ibc/relayer/hermes.go, framework/docker/jaeger/node.go, framework/docker/victoriatraces/node.go
Migrate Hermes, Jaeger, and VictoriaTraces constructors to NodeBuilder, add centralized hermesNodeName/nodeName helpers, and update Name() implementations.
Test Lint Suppressions for Deprecated APIs
framework/docker/chain_node_test.go, framework/docker/node_volume_test.go
Add //nolint:staticcheck annotations where tests still invoke deprecated node/volume APIs.
Docker Keyring Exec & Cleanup Hardening
framework/docker/internal/docker_keyring.go
Resolve key records before deletion and pass into container cleanup; change execCommand to poll ExecInspect until completion and enforce zero exit code; make .address deletion conditional on provided record; add time import.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested labels

enhancement

Suggested reviewers

  • tty47
  • mojtaba-esk

Poem

🐰 A little builder hops to play,
Chains of nodes made neat today,
Names now clean and setup whole,
Tests stay quiet, linters console,
Volumes hum and execs wait — hooray!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: introducing a NodeBuilder API to simplify Node creation.
Linked Issues check ✅ Passed The PR fully implements the requirements from issue #131: introduces a single NodeBuilder constructor that encapsulates Node, Lifecycle, and Volume setup, and migrates all 12 internal callers to use it.
Out of Scope Changes check ✅ Passed All changes are directly related to the objective of simplifying Node creation. The docker_keyring.go changes harden key deletion during node cleanup, which is related but focused on fixing a race condition in the node setup process.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch chatton/simplify-container-node-creation

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@chatton chatton marked this pull request as draft May 21, 2026 09:02
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@framework/docker/container/node.go`:
- Around line 119-134: The Build method on NodeBuilder should validate required
inputs before creating the Node: check that containerName and b.homeDir are
non-empty, and if b.hostNetwork is false ensure b.networkID is non-empty; return
a clear error (with context mentioning NodeBuilder.Build and containerName) if
any check fails. Do these checks at the top of NodeBuilder.Build (before calling
NewNode/NewLifecycle/CreateAndSetupVolume) so invalid config is rejected fast
and later calls like NewNode or CreateAndSetupVolume are not invoked with
missing values.

In `@framework/docker/cosmos/chain_builder.go`:
- Around line 493-501: The fallback branch incorrectly re-checks
nodeConfig.nodeType causing the validator path to be unreachable; when
nodeType==0, derive it from the validator flag instead of re-reading
nodeConfig.nodeType. Update the branch in chain_builder.go to: if nodeType == 0
{ if nodeConfig.Validator { nodeType = types.NodeTypeValidator } else { nodeType
= types.NodeTypeConsensusFull } }, referencing nodeConfig, nodeType,
types.NodeTypeValidator and types.NodeTypeConsensusFull to locate the change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4b7ac380-e844-4471-beb9-1f5f9491c78d

📥 Commits

Reviewing files that changed from the base of the PR and between 7288006 and bb27f33.

📒 Files selected for processing (18)
  • framework/docker/chain_node_test.go
  • framework/docker/container/node.go
  • framework/docker/cosmos/chain_builder.go
  • framework/docker/cosmos/node.go
  • framework/docker/dataavailability/network_builder.go
  • framework/docker/dataavailability/node.go
  • framework/docker/evstack/chain_builder.go
  • framework/docker/evstack/evmsingle/node.go
  • framework/docker/evstack/node.go
  • framework/docker/evstack/reth/node.go
  • framework/docker/evstack/spamoor/node.go
  • framework/docker/hyperlane/agent.go
  • framework/docker/hyperlane/forward_relayer.go
  • framework/docker/hyperlane/hyperlane.go
  • framework/docker/ibc/relayer/hermes.go
  • framework/docker/jaeger/node.go
  • framework/docker/node_volume_test.go
  • framework/docker/victoriatraces/node.go

Comment on lines +119 to +134
func (b *NodeBuilder) Build(ctx context.Context, containerName string) (*Node, error) {
n := NewNode(b.networkID, b.dockerClient, b.testName, b.image, b.homeDir, b.index, b.nodeType, b.logger)
lc := NewLifecycle(b.logger, b.dockerClient, containerName)
if b.hostNetwork {
lc.SetHostNetwork(b.hostNetwork)
}
n.ContainerLifecycle = lc
volName := containerName
if b.volumeName != "" {
volName = b.volumeName
}
if err := n.CreateAndSetupVolume(ctx, volName); err != nil {
return nil, fmt.Errorf("setup node %q: %w", containerName, err)
}
return n, nil
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate required builder inputs in Build before creating the node.

Build currently allows invalid state (e.g., empty containerName/homeDir, or empty networkID when host mode is off), which can fail later with less actionable Docker errors. Add upfront validation here to fail fast.

Suggested patch
 func (b *NodeBuilder) Build(ctx context.Context, containerName string) (*Node, error) {
+	if containerName == "" {
+		return nil, fmt.Errorf("containerName cannot be empty")
+	}
+	if b.homeDir == "" {
+		return nil, fmt.Errorf("homeDir cannot be empty")
+	}
+	if !b.hostNetwork && b.networkID == "" {
+		return nil, fmt.Errorf("networkID cannot be empty when host network is disabled")
+	}
+
 	n := NewNode(b.networkID, b.dockerClient, b.testName, b.image, b.homeDir, b.index, b.nodeType, b.logger)
 	lc := NewLifecycle(b.logger, b.dockerClient, containerName)
 	if b.hostNetwork {
 		lc.SetHostNetwork(b.hostNetwork)
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func (b *NodeBuilder) Build(ctx context.Context, containerName string) (*Node, error) {
n := NewNode(b.networkID, b.dockerClient, b.testName, b.image, b.homeDir, b.index, b.nodeType, b.logger)
lc := NewLifecycle(b.logger, b.dockerClient, containerName)
if b.hostNetwork {
lc.SetHostNetwork(b.hostNetwork)
}
n.ContainerLifecycle = lc
volName := containerName
if b.volumeName != "" {
volName = b.volumeName
}
if err := n.CreateAndSetupVolume(ctx, volName); err != nil {
return nil, fmt.Errorf("setup node %q: %w", containerName, err)
}
return n, nil
}
func (b *NodeBuilder) Build(ctx context.Context, containerName string) (*Node, error) {
if containerName == "" {
return nil, fmt.Errorf("containerName cannot be empty")
}
if b.homeDir == "" {
return nil, fmt.Errorf("homeDir cannot be empty")
}
if !b.hostNetwork && b.networkID == "" {
return nil, fmt.Errorf("networkID cannot be empty when host network is disabled")
}
n := NewNode(b.networkID, b.dockerClient, b.testName, b.image, b.homeDir, b.index, b.nodeType, b.logger)
lc := NewLifecycle(b.logger, b.dockerClient, containerName)
if b.hostNetwork {
lc.SetHostNetwork(b.hostNetwork)
}
n.ContainerLifecycle = lc
volName := containerName
if b.volumeName != "" {
volName = b.volumeName
}
if err := n.CreateAndSetupVolume(ctx, volName); err != nil {
return nil, fmt.Errorf("setup node %q: %w", containerName, err)
}
return n, nil
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@framework/docker/container/node.go` around lines 119 - 134, The Build method
on NodeBuilder should validate required inputs before creating the Node: check
that containerName and b.homeDir are non-empty, and if b.hostNetwork is false
ensure b.networkID is non-empty; return a clear error (with context mentioning
NodeBuilder.Build and containerName) if any check fails. Do these checks at the
top of NodeBuilder.Build (before calling
NewNode/NewLifecycle/CreateAndSetupVolume) so invalid config is rejected fast
and later calls like NewNode or CreateAndSetupVolume are not invoked with
missing values.

Comment thread framework/docker/cosmos/chain_builder.go Outdated
chatton added 2 commits May 21, 2026 10:41
- Remove dead branch in cosmos ChainBuilder that checked nodeType after
  confirming it was zero. Use derived nodeType for Validator field.
- Poll ExecInspect until exec completes in docker keyring execCommand
  to prevent race between rm and subsequent container reads.
- Resolve key address before local delete so the .address file can be
  cleaned up from the container.
@chatton chatton marked this pull request as ready for review May 21, 2026 10:18
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
framework/docker/internal/docker_keyring.go (1)

440-445: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Don’t swallow .address cleanup errors.

Line 444 ignores execCommand errors, which can leave stale address files silently and break consistency after successful key deletion. Return (or at least wrap) this error.

Suggested fix
 	if record != nil {
 		addr, err := record.GetAddress()
 		if err == nil {
 			addrFilePath := filepath.Join(d.containerKeyringDir, addr.String()+".address")
-			_ = d.execCommand(context.TODO(), []string{"rm", "-f", addrFilePath})
+			if err := d.execCommand(context.TODO(), []string{"rm", "-f", addrFilePath}); err != nil {
+				return fmt.Errorf("failed to delete address file %s: %w", addrFilePath, err)
+			}
 		}
 	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@framework/docker/internal/docker_keyring.go` around lines 440 - 445, The
cleanup of the ".address" file swallows errors from d.execCommand when removing
addrFilePath: after obtaining addr via record.GetAddress() and building
addrFilePath (using d.containerKeyringDir and addr.String()+".address"), check
the error returned by d.execCommand(context.TODO(), []string{"rm","-f",
addrFilePath}) and propagate it (or wrap it with context) instead of discarding;
update the surrounding function (where record, GetAddress, addrFilePath and
d.execCommand are used) to return that error so callers can handle cleanup
failures.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@framework/docker/internal/docker_keyring.go`:
- Around line 316-328: The exec polling loop that calls
d.dockerClient.ExecInspect with exec.ID currently blocks forever; change it to
respect context cancellation or a bounded timeout by either requiring a context
with deadline or wrapping the loop with a select that checks ctx.Done() (and/or
a timeout timer) on each iteration, and return a descriptive error if cancelled
or timed out; update the polling block around ExecInspect, inspect.Running, cmd
and ctx to abort when ctx.Done() triggers and propagate the context error
instead of looping indefinitely.

---

Outside diff comments:
In `@framework/docker/internal/docker_keyring.go`:
- Around line 440-445: The cleanup of the ".address" file swallows errors from
d.execCommand when removing addrFilePath: after obtaining addr via
record.GetAddress() and building addrFilePath (using d.containerKeyringDir and
addr.String()+".address"), check the error returned by
d.execCommand(context.TODO(), []string{"rm","-f", addrFilePath}) and propagate
it (or wrap it with context) instead of discarding; update the surrounding
function (where record, GetAddress, addrFilePath and d.execCommand are used) to
return that error so callers can handle cleanup failures.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 107b1110-d893-4046-a543-f125e8a7dc19

📥 Commits

Reviewing files that changed from the base of the PR and between bb27f33 and 9996951.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (5)
  • framework/docker/cosmos/chain_builder.go
  • framework/docker/cosmos/node.go
  • framework/docker/evstack/node.go
  • framework/docker/evstack/spamoor/node.go
  • framework/docker/internal/docker_keyring.go

Comment on lines +316 to 328
for {
inspect, err := d.dockerClient.ExecInspect(ctx, exec.ID, client.ExecInspectOptions{})
if err != nil {
return fmt.Errorf("failed to inspect exec result: %w", err)
}
if !inspect.Running {
if inspect.ExitCode != 0 {
return fmt.Errorf("command %v exited with non-zero status: %d", cmd, inspect.ExitCode)
}
return nil
}
time.Sleep(50 * time.Millisecond)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add a bounded timeout/cancellation path for exec polling.

Line 316 introduces an unbounded wait loop; with context.TODO() callers (e.g., Line 436/444), this can hang forever if Docker exec never exits. Wrap the poll path with a timeout (or require a deadline-bound ctx) and abort on ctx.Done().

Suggested fix
 func (d *dockerKeyring) execCommand(ctx context.Context, cmd []string) error {
+	if _, hasDeadline := ctx.Deadline(); !hasDeadline {
+		var cancel context.CancelFunc
+		ctx, cancel = context.WithTimeout(ctx, 30*time.Second)
+		defer cancel()
+	}
+
 	exec, err := d.dockerClient.ExecCreate(ctx, d.containerID, client.ExecCreateOptions{
 		Cmd: cmd,
 	})
@@
-	for {
+	ticker := time.NewTicker(50 * time.Millisecond)
+	defer ticker.Stop()
+	for {
+		select {
+		case <-ctx.Done():
+			return fmt.Errorf("exec command timed out/canceled: %w", ctx.Err())
+		case <-ticker.C:
+		}
+
 		inspect, err := d.dockerClient.ExecInspect(ctx, exec.ID, client.ExecInspectOptions{})
 		if err != nil {
 			return fmt.Errorf("failed to inspect exec result: %w", err)
 		}
 		if !inspect.Running {
 			if inspect.ExitCode != 0 {
 				return fmt.Errorf("command %v exited with non-zero status: %d", cmd, inspect.ExitCode)
 			}
 			return nil
 		}
-		time.Sleep(50 * time.Millisecond)
 	}
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@framework/docker/internal/docker_keyring.go` around lines 316 - 328, The exec
polling loop that calls d.dockerClient.ExecInspect with exec.ID currently blocks
forever; change it to respect context cancellation or a bounded timeout by
either requiring a context with deadline or wrapping the loop with a select that
checks ctx.Done() (and/or a timeout timer) on each iteration, and return a
descriptive error if cancelled or timed out; update the polling block around
ExecInspect, inspect.Running, cmd and ctx to abort when ctx.Done() triggers and
propagate the context error instead of looping indefinitely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make container.Node creation less clunky

1 participant