feat: add CEL-based batch_trigger gate for round start#1046
feat: add CEL-based batch_trigger gate for round start#1046
Conversation
Add an optional ARKD_BATCH_TRIGGER CEL formula that decides whether the server should start a new batch round. When unset (default), the legacy "always start a round" behaviour is preserved. Exposes intents_count, current_feerate, time_since_last_batch, boarding_inputs_count, total_boarding_amount, total_intent_fees and a now() helper to the program. Programs are compiled once at startup and re-used per round; eval errors fail open so a buggy formula does not permanently halt rounds. Closes #1045
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (5)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (4)
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🔍 Arkana Code Review — arkd#1046
feat: add CEL-based batch_trigger gate for round start
Overall: Well-structured PR that follows the established arkfee CEL pattern. The package decomposition, compile-once-eval-many design, fail-open semantics, and test coverage are solid. No cross-repo breakage — NewService is internal and all callers are updated.
That said, this touches round lifecycle (when rounds start). Flagging for mandatory human review per protocol.
🟡 Issues
1. Cold-start footgun: time_since_last_batch == 0 is ambiguous (Medium)
pkg/ark-lib/batchtrigger/types.go:18, internal/core/application/service.go:2434-2438
After server restart, lastBatchAt is 0 (not persisted), so time_since_last_batch is 0. This makes 0 mean both "never batched since boot" and "batched 0 seconds ago" — semantically different states.
Practical consequence: An operator writes:
intents_count > 5.0 || time_since_last_batch >= 3600.0
After a restart with 3 pending intents, the server waits up to an hour before the first batch despite having work to do, because time_since_last_batch reads as 0 (recently batched) instead of ∞ (never batched).
Suggested fix: Initialize lastBatchAt to 0 but treat 0 as "infinitely stale" — i.e., when last == 0, set TimeSinceLastBatch to math.MaxInt64 (or a large sentinel like 999999). Alternatively, persist lastBatchAt to the repo/cache so it survives restarts. Document whichever choice you make prominently in the README.
2. Double cache read creates minor inconsistency (Low)
internal/core/application/service.go:2420-2447
collectTriggerContext calls s.cache.Intents().Len(ctx) for IntentsCount, then separately calls s.cache.Intents().ViewAll(ctx, nil) for aggregation. Between the two calls, intents can be pushed or popped. Result: intents_count in the CEL context may not match the data backing boarding_inputs_count/total_intent_fees.
Suggested fix: Drop the Len() call. Use int64(len(intents)) from the ViewAll result instead. Single read, always consistent.
3. Redundant cache cleanup on retry (Low)
internal/core/application/service.go:2299-2333 vs 2528-2543
When the gate denies and retries, startRound re-enters and hits the existingRound != nil branch again, running the full cleanup (DeleteVtxos, Reset forfeit txs, etc.) on already-cleaned data. These ops are idempotent so it's not a bug, but it's wasted work every sessionDuration/6 seconds while the gate denies.
Suggested fix: Move the gate check above the cleanup block, or add a guard (e.g., if existingRound.Id != lastCleanedRoundId). Not blocking.
🟢 Things Done Well
- Compile-time validation: Invalid CEL programs are rejected in
Config.Validate()before the server boots. Operators get fast feedback. - Fail-open on eval errors: A runtime eval failure starts the batch rather than halting it. Correct safety direction for a financial system.
- Atomic
lastBatchAt: No mutex needed, correct concurrent access pattern. - Test coverage: 5 aggregation cases + 9 eval cases + nil/empty/all-variables tests + JSON fixture-driven tests for the package itself. Solid.
DeleteVtxosdoesn't affect intent data: Confirmed — it operates on a separate vtxo tracking set. Pending intents survive cleanup and remain readable by the trigger.
❓ Questions for Author
- Was the cold-start
time_since_last_batch == 0behavior intentional? The README says "0 if no batch finalized since boot" but this makes time-based escape hatches fail on fresh starts. - Any plans to persist
lastBatchAtto survive restarts? For longtime_since_last_batchthresholds (hours), losing this on restart is operationally painful.
🔴 Human Review Required
This PR modifies round lifecycle — specifically when startRound() proceeds vs. defers. While the gate can only delay rounds (never corrupt them), and the fail-open design limits blast radius, this is protocol-adjacent code on a Bitcoin L2. A human must review and approve before merge.
cc @Kukks — nice clean PR overall. The cold-start semantics (#1) are the main thing I'd want addressed before shipping.
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (5)
pkg/ark-lib/batchtrigger/trigger.go (2)
19-22: Embeddedcel.Programpromotes the entire CEL API onto*Trigger.With an anonymous
cel.Programfield, callers can bypassTrigger.Evaland calltrigger.Eval(map[string]any{...})(thecel.Programmethod) directly, sidestepping the nil-receiver guard and native-bool conversion. Making the program a named field prevents accidental misuse and keeps the public surface minimal.♻️ Proposed change
type Trigger struct { - cel.Program + program cel.Program txt string }Then update
Evalto uset.program.Eval(...)(line 68).Parseat line 51 becomes&Trigger{program: prg, txt: txt}.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/ark-lib/batchtrigger/trigger.go` around lines 19 - 22, The Trigger struct currently embeds cel.Program which exposes the whole CEL API on *Trigger; change the anonymous field to a named field (e.g., program cel.Program) in the Trigger type, update Parse (where prg and txt are returned) to construct the trigger as &Trigger{program: prg, txt: txt}, and modify Trigger.Eval to call t.program.Eval(...) (and retain the nil-receiver/natural-bool conversion logic) so callers cannot bypass Trigger.Eval by calling the embedded cel.Program directly.
43-45: UseIsExactTypeinstead of pointer comparison for CEL type equality.
ast.OutputType() != cel.BoolTyperelies on pointer equality with the singletoncel.BoolTypeinstance. While this works for primitives in cel-go, the semantically correct and idiomatic approach isType.IsExactType(). CEL-go's own type checker usesIsExactTypefor type comparisons. Using pointer inequality is fragile and could silently fail if cel-go's implementation wraps or caches types differently in the future.Suggested fix
- if ast.OutputType() != cel.BoolType { + if !ast.OutputType().IsExactType(cel.BoolType) { return nil, fmt.Errorf("expected return type bool, got %v", ast.OutputType()) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/ark-lib/batchtrigger/trigger.go` around lines 43 - 45, The comparison using pointer equality on CEL types is fragile: replace the check in the trigger validation that currently does ast.OutputType() != cel.BoolType with a semantic type check using IsExactType (e.g., ast.OutputType().IsExactType(cel.BoolType())) so the code uses CEL's Type.IsExactType API; update the error path in the same function (where the fmt.Errorf is returned) to be triggered when IsExactType returns false.internal/core/application/service.go (2)
148-151: Duplicate CEL compilation — see comment inconfig.go.The program is already compiled in
Config.Validate()for validation purposes; the same text is compiled again here. Not incorrect, but easy to dedupe by plumbing the compiled*batchtrigger.Triggerthrough instead of the raw string. Downstream impact of the root cause ininternal/config/config.go(Validate).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/core/application/service.go` around lines 148 - 151, The code recompiles the CEL program here using batchtrigger.New(batchTriggerProgram) even though Config.Validate() already compiles it; instead, plumb the compiled *batchtrigger.Trigger through the config into the service so you reuse the existing compiled trigger rather than the raw string. Modify the Config to expose the compiled *batchtrigger.Trigger produced in Config.Validate() (or add a getter), update the service constructor to accept that compiled trigger instead of batchTriggerProgram, and replace the call to batchtrigger.New in the service (the batchTrigger variable) with the passed-in compiled *batchtrigger.Trigger to avoid duplicate compilation.
2435-2442:ViewAllon every gate recheck can be costly with large intent queues.When the gate denies, the recheck cadence is
sessionDuration/6(default 5s). Each recheck materializes every pending intent viaViewAlland walks them to aggregate boarding/fee data. For deployments with a growing intent backlog (which is precisely the scenario where a trigger likeintents_count > Nis useful), this O(N) pass per 5s is wasted work whenever the formula doesn't actually reference the aggregates.Consider either:
- only calling
ViewAllwhen the formula references a boarding/fee variable (cel-go exposes the reference set viacel.AstToCheckedExpr/ checked references), or- caching the aggregates for a short TTL.
Not a blocker, but worth a note for future optimization.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/core/application/service.go` around lines 2435 - 2442, The code always calls s.cache.Intents().ViewAll and aggregateIntentTriggerData during every gate recheck which is costly; modify the gate recheck path to avoid the O(N) pass unless needed by (a) detecting whether the configured CEL trigger expression references the boarding/fee fields (e.g., inspect the CEL checked AST or reference set for names like BoardingInputsCount, TotalBoardingAmount, TotalIntentFees) and only call ViewAll + aggregateIntentTriggerData when those names are present, or (b) compute and store the aggregates once and reuse them for a short TTL before re-materializing; update the logic around s.cache.Intents().ViewAll, aggregateIntentTriggerData, and the trigger evaluation to implement one of these two approaches.internal/config/config.go (1)
664-666: Consider caching the compiled trigger onConfigto avoid double-compilation.
batchtrigger.New(c.BatchTrigger)is invoked here purely for validation, and the compiled*batchtrigger.Triggeris discarded.application.NewService(line 148 inservice.go) then recompiles the exact same program text. For a simple program this is cheap, but CEL compilation is not free and the duplication means any divergence between the two call sites (e.g. someone adds a wrapper or option) silently breaks the "validated at startup" guarantee.Optional: store the compiled trigger on
*Configand plumb it throughNewServiceinstead of the raw string.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/config/config.go` around lines 664 - 666, The code currently calls batchtrigger.New(c.BatchTrigger) only to validate and discards the compiled *batchtrigger.Trigger, causing NewService to recompile the same program; change Config to store the compiled trigger (add a field like CompiledBatchTrigger *batchtrigger.Trigger), have the config initialization call batchtrigger.New(c.BatchTrigger) once and assign the result to that field, update any callers (notably application.NewService) to accept and use the compiled *batchtrigger.Trigger instead of recompiling from c.BatchTrigger string, and ensure error paths and JSON/serialization handling for the new Config field are addressed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/core/application/service.go`:
- Around line 2456-2462: The code logs CEL evaluation failures at error level,
contradicting the README/PR which says such runtime evaluation errors are "warn
level and fail open"; update the log call in the batch trigger evaluation block
(the s.batchTrigger.Eval error handling that currently uses
log.WithError(err).Errorf and references s.batchTrigger.Source()) to use
log.WithError(err).Warnf(...) so failures are warned and the code continues to
"fail open" as intended.
- Around line 2413-2445: In collectTriggerContext, remove the separate
s.cache.Intents().Len(ctx) call (and its log) to avoid a non-atomic race between
Intents().Len and Intents().ViewAll; instead call Intents().ViewAll(ctx, nil)
once, then set tc.IntentsCount from the returned intents slice length (convert
to the same numeric type as tc.IntentsCount if needed), and keep the aggregation
via aggregateIntentTriggerData(intents) to populate
BoardingInputsCount/TotalBoardingAmount/TotalIntentFees so all fields derive
from the same snapshot.
In `@pkg/ark-lib/batchtrigger/celenv/functions.go`:
- Around line 12-20: The nowFunction binding is non-deterministic but uses
cel.FunctionBinding which gets constant-folded at compile time; change the
implementation for the "now" overload to avoid folding by either returning
types.Unknown from the FunctionBinding or (preferably) switching to
cel.LateFunctionBinding for the now overload so it is evaluated at runtime, and
remove the redundant float64 cast (use types.Double with the int64
time.Now().Unix() directly or let types.Double handle conversion); update the
overload binding in nowFunction accordingly.
---
Nitpick comments:
In `@internal/config/config.go`:
- Around line 664-666: The code currently calls batchtrigger.New(c.BatchTrigger)
only to validate and discards the compiled *batchtrigger.Trigger, causing
NewService to recompile the same program; change Config to store the compiled
trigger (add a field like CompiledBatchTrigger *batchtrigger.Trigger), have the
config initialization call batchtrigger.New(c.BatchTrigger) once and assign the
result to that field, update any callers (notably application.NewService) to
accept and use the compiled *batchtrigger.Trigger instead of recompiling from
c.BatchTrigger string, and ensure error paths and JSON/serialization handling
for the new Config field are addressed.
In `@internal/core/application/service.go`:
- Around line 148-151: The code recompiles the CEL program here using
batchtrigger.New(batchTriggerProgram) even though Config.Validate() already
compiles it; instead, plumb the compiled *batchtrigger.Trigger through the
config into the service so you reuse the existing compiled trigger rather than
the raw string. Modify the Config to expose the compiled *batchtrigger.Trigger
produced in Config.Validate() (or add a getter), update the service constructor
to accept that compiled trigger instead of batchTriggerProgram, and replace the
call to batchtrigger.New in the service (the batchTrigger variable) with the
passed-in compiled *batchtrigger.Trigger to avoid duplicate compilation.
- Around line 2435-2442: The code always calls s.cache.Intents().ViewAll and
aggregateIntentTriggerData during every gate recheck which is costly; modify the
gate recheck path to avoid the O(N) pass unless needed by (a) detecting whether
the configured CEL trigger expression references the boarding/fee fields (e.g.,
inspect the CEL checked AST or reference set for names like BoardingInputsCount,
TotalBoardingAmount, TotalIntentFees) and only call ViewAll +
aggregateIntentTriggerData when those names are present, or (b) compute and
store the aggregates once and reuse them for a short TTL before
re-materializing; update the logic around s.cache.Intents().ViewAll,
aggregateIntentTriggerData, and the trigger evaluation to implement one of these
two approaches.
In `@pkg/ark-lib/batchtrigger/trigger.go`:
- Around line 19-22: The Trigger struct currently embeds cel.Program which
exposes the whole CEL API on *Trigger; change the anonymous field to a named
field (e.g., program cel.Program) in the Trigger type, update Parse (where prg
and txt are returned) to construct the trigger as &Trigger{program: prg, txt:
txt}, and modify Trigger.Eval to call t.program.Eval(...) (and retain the
nil-receiver/natural-bool conversion logic) so callers cannot bypass
Trigger.Eval by calling the embedded cel.Program directly.
- Around line 43-45: The comparison using pointer equality on CEL types is
fragile: replace the check in the trigger validation that currently does
ast.OutputType() != cel.BoolType with a semantic type check using IsExactType
(e.g., ast.OutputType().IsExactType(cel.BoolType())) so the code uses CEL's
Type.IsExactType API; update the error path in the same function (where the
fmt.Errorf is returned) to be triggered when IsExactType returns false.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 7fdbeaaf-eb87-4bbf-b362-82dbb44a7a48
📒 Files selected for processing (15)
README.mdenvs/arkd.dev.envenvs/arkd.light.envinternal/config/config.gointernal/core/application/batch_trigger_test.gointernal/core/application/service.gopkg/ark-lib/batchtrigger/README.mdpkg/ark-lib/batchtrigger/celenv/env.gopkg/ark-lib/batchtrigger/celenv/functions.gopkg/ark-lib/batchtrigger/celenv/variables.gopkg/ark-lib/batchtrigger/testdata/invalid.jsonpkg/ark-lib/batchtrigger/testdata/valid.jsonpkg/ark-lib/batchtrigger/trigger.gopkg/ark-lib/batchtrigger/trigger_test.gopkg/ark-lib/batchtrigger/types.go
| var nowFunction = cel.Function("now", | ||
| cel.Overload("now_double", | ||
| []*cel.Type{}, | ||
| cel.DoubleType, | ||
| cel.FunctionBinding(func(_ ...ref.Val) ref.Val { | ||
| return types.Double(float64(time.Now().Unix())) | ||
| }), | ||
| ), | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
cel-go cel.FunctionBinding non-deterministic function constant folding
💡 Result:
In cel-go, the constant folding optimizer evaluates function calls with constant arguments at compile time to replace them with literal results, improving runtime performance. However, it skips folding for non-deterministic or side-effecting functions to avoid incorrect optimization. Key mechanisms: - The optimizer (NewConstantFoldingOptimizer) matches calls where all arguments are constant literals and attempts to evaluate them using a temporary program with NoVars activation. - For functions intended to have side effects or be non-deterministic (e.g., random, now, logging), the implementation should return types.Unknown during constant folding evaluation. This prevents folding, as the optimizer previously errored on missing implementations but was fixed to handle Unknown gracefully (PR #1117). - Late-bound functions via cel.LateFunctionBinding (added in PR #1117, merged Apr 2025) are explicitly skipped: isLateBoundFunctionCall check in folding.go prevents folding these calls, as their runtime binding makes them unsuitable for static optimization. This ensures soundness: deterministic pure functions fold safely; non-deterministic ones do not, preserving correct evaluation behavior even under cel.OptOptimize.
Citations:
- 1: Add LateFunctionBinding declaration and fix constant folding google/cel-go#1117
- 2: https://chromium.googlesource.com/external/github.com/google/cel-go/+/HEAD/cel/folding.go
- 3: Static optimizers for constant folding google/cel-go#804
🏁 Script executed:
cat -n pkg/ark-lib/batchtrigger/celenv/functions.go | head -40Repository: arkade-os/arkd
Length of output: 623
🏁 Script executed:
# Check if types.Double accepts int64 directly
rg "func Double" --type go -A 2Repository: arkade-os/arkd
Length of output: 40
🏁 Script executed:
# Look for usage of FunctionBinding vs LateFunctionBinding in the codebase
rg "FunctionBinding|LateFunctionBinding" --type goRepository: arkade-os/arkd
Length of output: 246
🏁 Script executed:
# Check the types.Double function signature in cel-go
rg "types\.Double" --type go -B 2 -A 2 | head -30Repository: arkade-os/arkd
Length of output: 743
🏁 Script executed:
# Check if there's any usage of types.Unknown in the codebase for non-deterministic functions
rg "types\.Unknown" --type go | head -10Repository: arkade-os/arkd
Length of output: 40
🏁 Script executed:
# Look at other function implementations in the same file
cat -n pkg/ark-lib/batchtrigger/celenv/functions.goRepository: arkade-os/arkd
Length of output: 623
🏁 Script executed:
# Check cel-go types package to see Double signature
# First, let's look for the types.Double definition in any vendored code or local definitions
fd "types.go" --type f | grep -E "common|types" | head -5Repository: arkade-os/arkd
Length of output: 236
🏁 Script executed:
# Search for Double function definition with more context
rg "func.*Double\(" --type go -A 3 | head -30Repository: arkade-os/arkd
Length of output: 40
🏁 Script executed:
# Check if there's any documentation or examples of non-deterministic functions in cel-go usage
rg "now\|random\|uuid" --type go pkg/Repository: arkade-os/arkd
Length of output: 40
🏁 Script executed:
# Check the cel-go dependency version in go.mod
grep "cel-go" go.modRepository: arkade-os/arkd
Length of output: 103
🏁 Script executed:
# Look for how now() is used in tests or actual CEL expressions
rg "now\(\)" --type go -B 2 -A 2Repository: arkade-os/arkd
Length of output: 1174
🌐 Web query:
cel-go types.Double int64 signature
💡 Result:
In CEL-Go, types.Double (types.DoubleType) and int64 (types.IntType) do not support operator overloads or function calls between them. Both types implement traits like Adder, Comparer, etc., but their method implementations (e.g., Add, Compare) are type-specific and only accept arguments of the exact same type. For example: - types.Double.Add(other ref.Val) checks if other is types.Double, else returns "no such overload". - types.Int.Add(other ref.Val) checks if other is types.Int, else returns "no such overload". Operator signatures like '+' ("+") are registered separately for each type: add_double (double + double -> double), add_int64 (int + int -> int). There are no cross-type overloads like double + int64 or int64 + double. To operate across types, explicit conversion is required, e.g., double(1i) or int(1.0d). This aligns with CEL's no-implicit-type-coercion design. Operator overload names from common/overloads include LessInt64Double ("less_int64_double"), confirming comparison overloads exist for int64-double, but core arithmetic like add does not for built-in types.
Citations:
- 1: https://pkg.go.dev/github.com/google/cel-go/common/types
- 2: https://chromium.googlesource.com/external/github.com/google/cel-go.git/+/refs/tags/v0.7.0/common/types/double.go
- 3: https://pkg.go.dev/github.com/google/cel-go/common/overloads
- 4: https://chromium.googlesource.com/external/github.com/google/cel-go.git/+/refs/tags/v0.7.0/common/types/int.go
- 5: Allow overriding singletons (add int + custom type) google/cel-go#990
🌐 Web query:
types.Double function constructor Go cel-go
💡 Result:
In the cel-go library (github.com/google/cel-go), the types.Double is a type alias for float64 defined in the github.com/google/cel-go/common/types package. It implements the ref.Val interface to support CEL operations like arithmetic, comparison, and type conversion. There is no dedicated "function constructor" like types.NewDouble or types.DoubleOf. Instead, create a types.Double value by type conversion: go import "github.com/google/cel-go/common/types" d := types.Double(3.14) // float64 value wrapped as types.Double This works because Double is type Double float64, allowing direct conversion from float64. The value supports methods like Add, Subtract, Multiply, Divide, Type (returns DoubleType), Value (returns underlying float64), and ConvertToNative. DoubleType is a singleton *types.Type exported from the package, used for type checks and declarations, e.g., cel.Variable("x", types.DoubleType). CEL expressions parse float literals (e.g., 3.14) directly as Double values during evaluation, without needing explicit construction in Go code. Sources confirm Double float64 and method implementations, with DoubleType singleton, but no exported factory function beyond the type conversion.
Citations:
- 1: https://pkg.go.dev/github.com/google/cel-go/common/types
- 2: https://chromium.googlesource.com/external/github.com/google/cel-go.git/+/refs/tags/v0.7.0/common/types/double.go
- 3: https://github.com/google/cel-go/tree/master/common/types
Critical issue: Non-deterministic function vulnerable to constant folding.
The implementation has a correctness bug: cel-go DOES constant-fold cel.FunctionBindingcalls at compile time by default, but this function is non-deterministic and lacks protection. The constant folding optimizer will evaluatenow()` once at compile time and replace all calls with a literal timestamp, breaking the intended semantics of returning the current time on each evaluation.
To prevent folding, either:
- Return
types.Unknownfrom the binding to signal non-determinism, or - Use
cel.LateFunctionBindinginstead ofcel.FunctionBinding(explicitly excluded from folding as of v0.26.1).
Secondary note: The float64(time.Now().Unix()) cast is redundant; types.Double accepts int64 directly and will implicitly convert it.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/ark-lib/batchtrigger/celenv/functions.go` around lines 12 - 20, The
nowFunction binding is non-deterministic but uses cel.FunctionBinding which gets
constant-folded at compile time; change the implementation for the "now"
overload to avoid folding by either returning types.Unknown from the
FunctionBinding or (preferably) switching to cel.LateFunctionBinding for the now
overload so it is evaluated at runtime, and remove the redundant float64 cast
(use types.Double with the int64 time.Now().Unix() directly or let types.Double
handle conversion); update the overload binding in nowFunction accordingly.
There was a problem hiding this comment.
🔍 Arkana — Follow-up review (commit a5945b3)
New commit a5945b3 is cosmetic only — gofmt struct field alignment in two test files (batch_trigger_test.go, trigger_test.go). No logic changes, no new code paths.
Previous findings still stand:
- 🟡 Cold-start footgun —
time_since_last_batch == 0is ambiguous (never-batched vs just-batched) - 🟡 Double cache read —
Len()+ViewAll()can diverge under concurrent intent churn
No new issues introduced by this commit. Still flagged for human review per protocol (round lifecycle gate).
There was a problem hiding this comment.
🔍 Arkana — Follow-up review (commit ae0858e)
New commit ae0858e is a lint-only fix — drops redundant embedded struct selectors in aggregateIntentTriggerData (service.go:2399-2400):
// before (QF1008 lint warning)
inputAmount := it.Intent.TotalInputAmount() + boardingAmount
outputAmount := it.Intent.TotalOutputAmount()
// after
inputAmount := it.TotalInputAmount() + boardingAmount
outputAmount := it.TotalOutputAmount()TimedIntent embeds domain.Intent (ports/live_store.go:93), so promoted method access is semantically identical. ✅ No logic change.
Previous findings still stand (from initial review):
- 🟡 Cold-start footgun —
time_since_last_batch == 0is ambiguous (never-batched vs just-batched) - 🟡 Double cache read —
Len()+ViewAll()can diverge under concurrent intent churn
No new issues introduced. Waiting on author response to the above before approving.
- collectTriggerContext: derive IntentsCount from the same ViewAll snapshot used for the boarding/fee aggregates, eliminating the race between Len() and ViewAll(). - evalBatchTrigger: log eval failures at warn level (matches the README description of failing open). - batchtrigger.Trigger: keep cel.Program in a named (unexported) field so the package's public surface is just Eval/Source. - batchtrigger.Parse: validate output type via Type.IsExactType instead of pointer equality. - celenv: clarify in the now() comment why the binding is safe (no constant-folding optimizer is enabled on BatchTriggerEnv).
There was a problem hiding this comment.
🔍 Arkana — Follow-up review (commit 5f12bb6)
Commit 5f12bb6 ("refactor: address CodeRabbit review on batch_trigger") — all changes are correct.
What changed
| Change | Verdict |
|---|---|
Double-read race fixed — Len() removed, IntentsCount derived from len(intents) via single ViewAll() call |
✅ Exactly addresses finding #2 from initial review |
Errorf → Warnf on eval failure |
✅ Consistent with fail-open semantics |
Unexported program field — cel.Program was embedded, now program cel.Program |
✅ Proper encapsulation, no accidental interface leak |
!ast.OutputType().IsExactType(cel.BoolType) instead of != |
✅ Safer type comparison (pointer equality on *Type is fragile across CEL versions) |
Drop redundant float64() cast in now() |
✅ int64 → float64 implicit conversion is fine |
now() constant-folding comment |
✅ Good documentation |
Outstanding
🟡 Cold-start footgun (finding #1) — time_since_last_batch == 0 is still ambiguous between "never batched" and "just batched." This is an operator-facing documentation concern, not a correctness bug. Consider mentioning it in pkg/ark-lib/batchtrigger/README.md so operators know to use time_since_last_batch <= 0.0 or an explicit sentinel to handle the never-batched case.
No new issues. Still flagged for human review per protocol (round lifecycle gate).
Summary
Closes #1045.
Adds an optional
ARKD_BATCH_TRIGGERCEL formula that decides whether the server should start a new batch round. The gate is evaluated at the top ofstartRound(). If the formula returnsfalse, the round is skipped and rechecked one registration window (session_duration / 6) later. When unset (default), the legacy "always start a round" behaviour is preserved.The new
pkg/ark-lib/batchtriggerpackage mirrors the design ofpkg/ark-lib/arkfee— CEL programs are compiled once at startup and reused per evaluation. Misconfigured programs fail validation inConfig.Validate()so they never make it past server startup. Runtime eval errors fail open (logged at warn level) so a buggy formula cannot permanently halt rounds.CEL environment
intents_countdoublecurrent_feeratedoubletime_since_last_batchdouble0until first batch)boarding_inputs_countdoubletotal_boarding_amountdoubletotal_intent_feesdouble(inputs - outputs)across pending intentsnow() -> doubleis also available.Example
Test plan
go test ./pkg/ark-lib/batchtrigger/... -v -count=1— all subtests pass (compile/parse + eval against valid & invalid fixtures, all variables accessible)go test ./internal/core/application/ -run TestAggregateIntentTriggerData -v— pure aggregation helper covered (5 cases)go test ./internal/core/application/ -run TestEvalBatchTrigger -v— gate evaluation incl. issue feat: CEL formula to control batch trigger conditions #1045 examples (9 cases)GOOS=linux go vet ./...cleanGOOS=linux go build ./...cleanNotes on review feedback
CodeRabbit flagged the following, all addressed in
5f12bb66:collectTriggerContextpreviously calledIntents().Len()andIntents().ViewAll()separately, which could disagree under concurrent registrations. Both now derive from the sameViewAllsnapshot.warn(noterror) — matches the "fail open" wording in the README.batchtrigger.Triggerkeepscel.Programin a named (unexported) field so the package's public API is justEval/Source— callers can no longer bypassTrigger.Eval's nil guard.batchtrigger.Parsenow validates the output type viaType.IsExactType(cel.BoolType)instead of pointer equality.now()overload doesn't needLateFunctionBindingbecauseBatchTriggerEnvdoes not enable the constant folding optimizer (cel.OptOptimize); a comment now documents this.The two duplicate-compilation nitpicks (compile in
Validate()then again inNewService) and the "skip ViewAll when the formula doesn't reference aggregates" optimization were left as future work — the cost is microseconds at startup / per recheck, and the alternatives meaningfully grow the API surface.Summary by CodeRabbit