Fix: discard interrupted SQLite connections and make asset reads stable under churn#1043
Fix: discard interrupted SQLite connections and make asset reads stable under churn#1043chris-ricketts wants to merge 13 commits into
Conversation
WalkthroughSingle-query asset retrieval added (SelectAssetsWithUnspentAmountsByIds) returning per-asset unspent amounts; asset repos parse and accumulate those amounts. SQLite layer now exposes a Read/Write SQLiteDB abstraction with pinned-connection helpers, interrupt-aware connection discard, and many SQLite repos/tests migrated to use it. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client
participant Repo as AssetRepo
participant ReadDB as SQLiteDB.Read()
participant Querier as sqlc.Queries
participant VTXO as vtxo_vw
rect rgba(80,170,240,0.5)
Client->>Repo: GetAssets(ctx, ids)
end
rect rgba(120,200,80,0.5)
Repo->>ReadDB: withReadQuerier(ctx, ReadDB, fn)
ReadDB->>Querier: pin connection, create querier
Querier->>VTXO: SELECT assets LEFT JOIN vtxo_vw WHERE ids...
VTXO-->>Querier: rows with asset_amount (text)
Querier-->>Repo: []SelectAssetsWithUnspentAmountsByIdsRow
end
rect rgba(240,140,70,0.5)
Repo->>Repo: parse rows, init Supply=0, parse AssetAmount -> big.Int, accumulate per asset
Repo-->>Client: []domain.Asset (with Supply)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Arkana Code Review — #1043
Reviewer: Arkana (aggressive review mode)
Verdict: Request changes — one correctness bug on the commit-success path, plus a missing safety pragma. Protocol-critical files touched; human sign-off required.
🔴 BUG — execTx: successful commit returns error if closeConn fails
internal/infrastructure/db/sqlite/utils.go — new execTx, around the commit path (approx lines 152-167 in the new file)
After tx.Commit() succeeds, the code calls closeConn(conn, false). If closeConn returns an error (e.g., the connection is already done, a file-descriptor issue, etc.), execTx returns that error to the caller:
if err := tx.Commit(); err != nil { ... }
if err := closeConn(conn, false); err != nil {
return err // ← commit already succeeded!
}
return nilThe caller sees an error and may retry the write. For protocol-critical operations in vtxo_repo.go (SpendVtxos, SettleVtxos, SweepVtxos) and round_repo.go (AddOrUpdateRound), this means a successfully committed transaction could be retried, leading to double-spend attempts, duplicate round entries, or at minimum confusing error propagation.
Fix: After a successful commit, log closeConn errors but don't return them. The data is already persisted — the connection cleanup is best-effort.
if err := tx.Commit(); err != nil { ... }
if err := closeConn(conn, false); err != nil {
log.WithError(err).Warn("failed to close connection after successful commit")
}
return nil🟡 MEDIUM — Missing PRAGMA busy_timeout on both pools
internal/infrastructure/db/sqlite/utils.go — OpenDb() function
WAL mode is correctly enabled for concurrent read/write, but there's no busy_timeout set on either pool. Without it, any SQLITE_BUSY error (e.g., during WAL checkpointing, or a brief write-lock overlap) causes an immediate failure rather than a short wait-and-retry.
The write path has retry logic in execTx (which handles isConflictError), but the read path via withReadQuerier has no retry at all. Under heavy load with WAL checkpointing, a read could sporadically fail with SQLITE_BUSY.
Recommendation: Add PRAGMA busy_timeout = 5000; (or similar) on both the read and write connections right after enabling WAL:
if _, err := writeDB.Exec(`PRAGMA journal_mode = WAL;`); err != nil { ... }
if _, err := writeDB.Exec(`PRAGMA busy_timeout = 5000;`); err != nil { ... }
if _, err := readDB.Exec(`PRAGMA busy_timeout = 5000;`); err != nil { ... }🟡 MEDIUM — Postgres AssetAmount field typed as interface{}
internal/infrastructure/db/postgres/sqlc/queries/query.sql.go — SelectAssetsWithUnspentAmountsByIdsRow
The Postgres version of the new query scans COALESCE(v.asset_amount::TEXT, '0') into interface{}:
type SelectAssetsWithUnspentAmountsByIdsRow struct {
...
AssetAmount interface{}
}This requires the runtime stringifyAssetAmount type-switch in asset_repo.go. The SQLite version correctly uses sql.NullString. Consider adding a sqlc override or casting so the Postgres generated code also gets string or sql.NullString. The interface{} approach is fragile — a driver upgrade could change the concrete type returned.
🟢 LOW — isInterruptError relies on heuristic string matching
internal/infrastructure/db/sqlite/utils.go — isInterruptError()
errMsg := strings.ToLower(err.Error())
return strings.Contains(errMsg, "interrupted") || strings.Contains(errMsg, "sqlite_interrupt")This is inherently fragile — any error message containing "interrupted" (even unrelated ones) would trigger connection discard. The modernc.org/sqlite driver does expose sqlite3.Error with a code field. Consider unwrapping to check the actual SQLite error code (SQLITE_INTERRUPT = 9) for a more robust check. The context-cancellation checks above are solid; this is just about the fallback heuristic.
🟢 OBSERVATION — Supply calculation semantics look correct
The old approach (per-asset SelectAssetSupply / SelectAssetAmounts) computed supply via separate queries inside a transaction. The new approach (SelectAssetsWithUnspentAmountsByIds) uses a single LEFT JOIN returning one row per unspent VTXO per asset, with client-side aggregation via indexByID. The semantics are equivalent:
- Both filter
spent = false AND asset_amount > 0 - LEFT JOIN correctly produces a NULL row for assets with zero unspent VTXOs (handled by
Supply: *big.NewInt(0)) - The
testAssetRepositorySpentOnlySupplytest correctly covers the spent-only edge case
The Postgres query joins vtxo_vw (which itself LEFT JOINs asset_projection), which is semantically equivalent to the old INNER JOIN on asset_projection + vtxo since the asset_amount > 0 filter excludes the NULL/0 rows from the LEFT JOIN.
🟢 OBSERVATION — Connection discard mechanism is sound
The closeConn(conn, discard) pattern using conn.Raw(func(any) error { return driver.ErrBadConn }) to force database/sql to evict a tainted connection is a known, correct pattern. The cancellation_test.go validates this properly.
⚠️ PROTOCOL-CRITICAL — Human review required
This PR touches vtxo_repo.go and round_repo.go, which contain protocol-critical code paths:
- vtxo_repo.go:
AddVtxos,SpendVtxos,SettleVtxos,SweepVtxos,UnrollVtxos,GetVtxos,GetLeafVtxosForBatch— all core VTXO lifecycle operations - round_repo.go:
AddOrUpdateRound,GetRoundWithId,GetRoundForfeitTxs,GetRoundConnectorTree,GetRoundVtxoTree,GetSweepableRounds— round lifecycle, forfeit paths, connector trees
The changes are mechanical (wrapping existing calls in withReadQuerier/withWriteQuerier and execTx(ctx, r.db.Write(), ...)), but given that bugs here lose real money, a human must verify each call site.
Summary
| # | Severity | Finding |
|---|---|---|
| 1 | 🔴 Bug | execTx returns error after successful commit if closeConn fails — can cause retried writes |
| 2 | 🟡 Medium | Missing PRAGMA busy_timeout — reads can fail under WAL checkpoint contention |
| 3 | 🟡 Medium | Postgres AssetAmount as interface{} — fragile type assertion |
| 4 | 🟢 Low | isInterruptError string heuristic — consider SQLite error code check |
| 5 | vtxo_repo.go + round_repo.go changes require human sign-off |
Overall architecture is solid — split read/write pools with WAL and connection-discard-on-interrupt is the right approach for SQLite under concurrent load. The e2e TestGetAssetQueryChurn test is thorough. Fix the commit-success-path bug (#1) before merging.
🤖 Reviewed by Arkana
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (4)
internal/infrastructure/db/sqlite/conviction_repo.go (1)
38-95:err == sql.ErrNoRowswon't survive a wrapped error fromwithReadQuerier.Both
Get(line 45) andGetActiveScriptConvictions(line 79) compare the return ofwithReadQueriertosql.ErrNoRowswith==. IfcloseConnalso fails, the helper returnsfmt.Errorf("%w: %w", err, closeErr)and the sentinel becomes unreachable, silently falling through to the "failed to get conviction" error instead of the intended "not found" mapping. Prefererrors.Is.🛠️ Proposed fix
- if err == sql.ErrNoRows { + if errors.Is(err, sql.ErrNoRows) { return nil, fmt.Errorf("conviction with id %s not found", id) }Apply the same change at line 79, and add
"errors"to the imports.Note: the line 79 branch is additionally unreachable in practice —
SelectActiveScriptConvictionsis a:manyquery and sqlc returns an empty slice rather thansql.ErrNoRowson an empty result. That pre-dates this PR, but worth considering while you're in the area.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/infrastructure/db/sqlite/conviction_repo.go` around lines 38 - 95, Replace direct equality checks against sql.ErrNoRows with errors.Is in convictionRepository.Get and GetActiveScriptConvictions so wrapped errors returned by withReadQuerier are correctly identified; update the error handling branches in Get (function Get) and GetActiveScriptConvictions (function GetActiveScriptConvictions) to use errors.Is(err, sql.ErrNoRows) and add "errors" to the imports.internal/infrastructure/db/sqlite/asset_repo.go (1)
175-203: Useerrors.Isfor thesql.ErrNoRowscheck afterwithReadQuerier.Same pattern as the other SQLite repos in this PR:
withReadQueriercan wrap the query error with acloseConnerror, makingerr == sql.ErrNoRowsfalse even when the underlying cause is "no rows". Switch both sites (line 182 and 198) toerrors.Is(err, sql.ErrNoRows)and add"errors"to imports.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/infrastructure/db/sqlite/asset_repo.go` around lines 175 - 203, The sql.ErrNoRows comparisons in GetControlAsset and AssetExists can fail because withReadQuerier may wrap the error; update both checks to use errors.Is(err, sql.ErrNoRows) instead of direct equality and add the "errors" import. Specifically, change the error handling in assetRepository.GetControlAsset (where it currently does if err == sql.ErrNoRows) and in assetRepository.AssetExists (where it currently does if err == sql.ErrNoRows) to use errors.Is, preserving the existing return behavior.internal/infrastructure/db/sqlite/intent_fees_repo.go (1)
39-48: Useerrors.Isfor thesql.ErrNoRowssentinel check.
withReadQueriermay returnfmt.Errorf("%w: %w", err, closeErr)whencloseConnfails in addition to a query error, so the direct==comparison will miss a wrappedsql.ErrNoRowsand return the generic error path instead of an emptyIntentFees.scheduled_session_repo.goalready useserrors.Is— matching that keeps behaviour consistent across repos.🛠️ Proposed fix
}); err != nil { - if err == sql.ErrNoRows { + if errors.Is(err, sql.ErrNoRows) { return &domain.IntentFees{}, nil } return nil, fmt.Errorf("failed to get intent fees: %w", err) }Remember to add
"errors"to the imports.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/infrastructure/db/sqlite/intent_fees_repo.go` around lines 39 - 48, The sql.ErrNoRows sentinel comparison should use errors.Is to handle wrapped errors from withReadQuerier; update the error branch after calling withReadQuerier(ctx, r.db, func(q *queries.Queries) error { ... }) to check if errors.Is(err, sql.ErrNoRows) and return &domain.IntentFees{}, nil in that case, and ensure the "errors" package is imported; keep the existing fmt.Errorf("%w", err) path for other errors and reference SelectLatestIntentFees and domain.IntentFees in the change.internal/infrastructure/db/sqlite/utils.go (1)
256-271: Use type assertion for typed SQLite error checking instead of substring matching.The string match on
"interrupted"/"sqlite_interrupt"works withmodernc.org/sqlitetoday but is fragile to driver changes. The current context checks (context.Canceled,context.DeadlineExceeded) cover most real-world cases; the substring fallback only runs for driver-surfaced interrupt errors not caught by context checks.Consider type assertion to check the driver's typed error code instead:
Suggested implementation
var sqliteErr *sqlite.Error if errors.As(err, &sqliteErr) { var code int if err2 := errors.Unwrap(err); err2 != nil { if se, ok := err2.(*sqlite.Error); ok { code = se.Code } } else if se, ok := err.(*sqlite.Error); ok { code = se.Code } if code == sqlite3.SQLITE_INTERRUPT { return true } }Or more simply via type assertion:
var sqliteErr *sqlite.Error if ok := errors.As(err, &sqliteErr); ok && sqliteErr.Code == sqlite3.SQLITE_INTERRUPT { return true }This is a minor improvement—your doc comment already calls out the heuristic nature, and
cancellation_test.gocovers the behavior in practice.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/infrastructure/db/sqlite/utils.go` around lines 256 - 271, Update isInterruptError to prefer a typed driver error check instead of brittle substring matching: keep the existing context checks (context.Canceled, context.DeadlineExceeded and ctx.Err()) and then use errors.As to see if the error is a *sqlite.Error (or the driver-specific error type) and compare its Code against sqlite3.SQLITE_INTERRUPT (or the driver constant) to return true; remove or de-emphasize the string.contains checks for "interrupted"/"sqlite_interrupt". Ensure you reference the sqlite error type (e.g., *sqlite.Error) and the SQLITE_INTERRUPT constant when implementing the type assertion fallback in isInterruptError.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/infrastructure/db/postgres/asset_repo.go`:
- Around line 145-150: The metadata parsing assigns both a value and the outer
err (ast.Metadata, err = asset.NewMetadataListFromString(...)) and thus can
leave ast.Metadata set to a partial/invalid value and clobber the loop's outer
err; change this to capture the parse error into a local variable (e.g.,
parseErr := asset.NewMetadataListFromString(row.Metadata.String) or metadata,
parseErr := ...), only assign ast.Metadata when parseErr == nil, and log
parseErr (not overwrite the outer err) — update the code paths around
NewMetadataListFromString, ast.Metadata, and row.Metadata to match the SQLite
pattern so the outer err is not reused and ast.Metadata is only set on
successful parse.
In `@internal/infrastructure/db/sqlite/asset_repo.go`:
- Around line 161-170: The loop silently skips rows when
new(big.Int).SetString(row.AssetAmount.String, 10) fails, which can hide data
corruption; change this to mirror the Postgres behavior by surfacing an error
instead of continuing: when SetString returns ok == false, return a descriptive
error (including the offending row/amount or index) from the enclosing function
rather than continuing, and ensure the error path propagates to callers; update
references around row.AssetAmount, new(big.Int).SetString, and
assets[idx].Supply to implement this error return so the two backends behave
consistently.
In `@internal/infrastructure/db/sqlite/sqlc/query.sql`:
- Around line 444-458: The LEFT JOIN in the SQL for
SelectAssetsWithUnspentAmountsByIds can yield NULL v.asset_amount on SQLite;
update the SELECT to normalize missing unspent amounts by wrapping
v.asset_amount with COALESCE (or equivalent) to return 0 instead of NULL so
consumers parsing the result see 0 consistently (keep the column alias
asset_amount unchanged).
In `@internal/test/e2e/e2e_test.go`:
- Around line 5440-5463: The retry loop around
time.Now().Before(batchInDbDeadline) currently calls require.NoError(t, err) and
require.Len(t, dbVtxos.Vtxos, len(outpoints)) on every iteration which aborts
instead of retrying; change the body so alice.Indexer().GetVtxos(ctx,
indexer.WithOutpoints(outpoints)) and its error/length checks are treated as
soft checks (if err != nil or len(dbVtxos.Vtxos) != len(outpoints) then
time.Sleep and continue) and only call require.NoError and require.Len after the
loop (once deadline is reached or success), keep polling
alice.Indexer().GetCommitmentTx and bob.Indexer().GetCommitmentTx similarly by
treating their errors as non-fatal within the loop; also fix the allSpent
computation to set allSpent := true and for each v do allSpent = allSpent &&
v.Spent (instead of assigning allSpent = v.Spent) so the invariant is explicit.
In `@internal/test/e2e/utils_test.go`:
- Around line 820-824: In the timeout branch of the select (the case
<-ctx.Done() that returns fmt.Errorf("timed out - %d/%d recieved", len(vtxos),
atLeastN)), fix the typo in the error string from "recieved" to "received" so
the message reads "timed out - %d/%d received" (referencing ctx.Done(),
len(vtxos) and atLeastN to locate the exact return).
---
Nitpick comments:
In `@internal/infrastructure/db/sqlite/asset_repo.go`:
- Around line 175-203: The sql.ErrNoRows comparisons in GetControlAsset and
AssetExists can fail because withReadQuerier may wrap the error; update both
checks to use errors.Is(err, sql.ErrNoRows) instead of direct equality and add
the "errors" import. Specifically, change the error handling in
assetRepository.GetControlAsset (where it currently does if err ==
sql.ErrNoRows) and in assetRepository.AssetExists (where it currently does if
err == sql.ErrNoRows) to use errors.Is, preserving the existing return behavior.
In `@internal/infrastructure/db/sqlite/conviction_repo.go`:
- Around line 38-95: Replace direct equality checks against sql.ErrNoRows with
errors.Is in convictionRepository.Get and GetActiveScriptConvictions so wrapped
errors returned by withReadQuerier are correctly identified; update the error
handling branches in Get (function Get) and GetActiveScriptConvictions (function
GetActiveScriptConvictions) to use errors.Is(err, sql.ErrNoRows) and add
"errors" to the imports.
In `@internal/infrastructure/db/sqlite/intent_fees_repo.go`:
- Around line 39-48: The sql.ErrNoRows sentinel comparison should use errors.Is
to handle wrapped errors from withReadQuerier; update the error branch after
calling withReadQuerier(ctx, r.db, func(q *queries.Queries) error { ... }) to
check if errors.Is(err, sql.ErrNoRows) and return &domain.IntentFees{}, nil in
that case, and ensure the "errors" package is imported; keep the existing
fmt.Errorf("%w", err) path for other errors and reference SelectLatestIntentFees
and domain.IntentFees in the change.
In `@internal/infrastructure/db/sqlite/utils.go`:
- Around line 256-271: Update isInterruptError to prefer a typed driver error
check instead of brittle substring matching: keep the existing context checks
(context.Canceled, context.DeadlineExceeded and ctx.Err()) and then use
errors.As to see if the error is a *sqlite.Error (or the driver-specific error
type) and compare its Code against sqlite3.SQLITE_INTERRUPT (or the driver
constant) to return true; remove or de-emphasize the string.contains checks for
"interrupted"/"sqlite_interrupt". Ensure you reference the sqlite error type
(e.g., *sqlite.Error) and the SQLITE_INTERRUPT constant when implementing the
type assertion fallback in isInterruptError.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 795aa57c-77a1-4961-969b-f88a439385de
📒 Files selected for processing (19)
internal/infrastructure/db/postgres/asset_repo.gointernal/infrastructure/db/postgres/sqlc/queries/query.sql.gointernal/infrastructure/db/postgres/sqlc/query.sqlinternal/infrastructure/db/service.gointernal/infrastructure/db/service_test.gointernal/infrastructure/db/sqlite/asset_repo.gointernal/infrastructure/db/sqlite/cancellation_test.gointernal/infrastructure/db/sqlite/conviction_repo.gointernal/infrastructure/db/sqlite/intent_fees_repo.gointernal/infrastructure/db/sqlite/intent_txid_migration_test.gointernal/infrastructure/db/sqlite/offchain_tx_repo.gointernal/infrastructure/db/sqlite/round_repo.gointernal/infrastructure/db/sqlite/scheduled_session_repo.gointernal/infrastructure/db/sqlite/sqlc/queries/query.sql.gointernal/infrastructure/db/sqlite/sqlc/query.sqlinternal/infrastructure/db/sqlite/utils.gointernal/infrastructure/db/sqlite/vtxo_repo.gointernal/test/e2e/e2e_test.gointernal/test/e2e/utils_test.go
There was a problem hiding this comment.
Arkana Code Review — #1043 (follow-up on new commits)
Reviewer: Arkana (aggressive review mode)
Verdict: ✅ Approve — previous critical findings addressed. Remaining items are low severity.
Previous findings — status
| # | Previous finding | Status |
|---|---|---|
🔴 execTx returns error after successful commit |
✅ Fixed — now logs warning, doesn't propagate (utils.go:182) |
|
🟡 Missing PRAGMA busy_timeout |
✅ Fixed — set to 5000ms on both read and write pools (utils.go:91-101) |
|
🟢 isInterruptError string heuristic |
✅ Fixed — now uses proper sqlite.Error code check against SQLITE_INTERRUPT (utils.go:283-288) |
|
🟡 Postgres AssetAmount as interface{} |
New observations on commit 344f2a7d
🟢 LOW — Copy-paste in busy_timeout error messages
internal/infrastructure/db/sqlite/utils.go:93,99 — Both busy_timeout PRAGMA error paths say "failed to enable WAL" instead of "failed to set busy_timeout". Non-functional but confusing when debugging.
🟢 LOW — Postgres AssetAmount still interface{}
internal/infrastructure/db/postgres/sqlc/queries/query.sql.go — The Postgres generated code still types AssetAmount as interface{}, requiring the stringifyAssetAmount type-switch in postgres/asset_repo.go. A sqlc column override (e.g., column: "*.asset_amount" go_type: "string") would make this type-safe and align it with the SQLite version which correctly gets string. Not blocking, but worth a follow-up.
🟢 LOW — err == sql.ErrNoRows vs errors.Is
sqlite/asset_repo.go:178,194 and sqlite/conviction_repo.go:45,79 — Direct == comparisons with sql.ErrNoRows after withReadQuerier. If closeConn also fails (very unlikely on non-interrupt path), the error wrapping in withReadQuerier (fmt.Errorf("%w: %w", err, closeErr)) would prevent the sentinel from matching. Extremely low probability in practice since closeConn(conn, false) rarely fails, but errors.Is would be more correct. CodeRabbit also flagged this.
✅ GOOD — COALESCE(v.asset_amount, '0') on SQLite query
The SQLite query now uses COALESCE to default NULL amounts to '0', making AssetAmount a non-nullable string. This eliminates the previous sql.NullString handling and the silent continue on invalid amounts — replaced with an explicit error return (asset_repo.go:163). Clean improvement.
✅ GOOD — Metadata parse error handling (Postgres)
postgres/asset_repo.go:148-153 — Fixed to use separate parseErr variable and only assign metadata on success. Previously the err variable shadowed the outer scope, which could mask the parse error. Good catch.
✅ GOOD — Test improvements
Test error messages now use completed (1-indexed) instead of i (0-indexed), batch interval check simplified, typo fix. All cosmetic but improve debuggability.
⚠️ PROTOCOL-CRITICAL — Human review still required
Previous review flagged vtxo_repo.go and round_repo.go changes. Those files were not modified in this commit, but the overall PR still touches protocol-critical VTXO and round lifecycle paths via the withReadQuerier/withWriteQuerier refactor. Human sign-off still needed before merge.
Summary
The commit cleanly addresses the three actionable findings from the previous review. The execTx commit-success bug (🔴) is fixed, busy_timeout is set, and interrupt detection is now type-safe. Remaining items are low severity and can be addressed in follow-ups. Architecture is sound.
🤖 Reviewed by Arkana
344f2a7 to
ce43d36
Compare
There was a problem hiding this comment.
Arkana Code Review — #1043 (commit ce43d36)
Reviewer: Arkana (aggressive review mode)
Verdict: ✅ Approve — all critical/medium findings from previous reviews are resolved.
Previous findings — final status
| # | Finding | Severity | Status |
|---|---|---|---|
| 1 | execTx returns error after successful commit |
🔴 HIGH | ✅ Fixed — logs warning, no longer propagates error |
| 2 | Missing PRAGMA busy_timeout |
🟡 MEDIUM | ✅ Fixed — set to 5000ms on both read and write pools |
| 3 | isInterruptError string heuristic |
🟢 LOW | ✅ Fixed — proper sqlite.Error code check against SQLITE_INTERRUPT |
| 4 | Postgres metadata err shadowing |
🟢 LOW | ✅ Fixed — separate parseErr variable with else guard |
| 5 | SQLite AssetAmount NULL handling |
🟢 LOW | ✅ Fixed — COALESCE(v.asset_amount, '0') makes column non-nullable, silent continue replaced with explicit error |
Remaining nits (non-blocking)
🟢 LOW — busy_timeout error messages still say "failed to enable WAL"
sqlite/utils.go — Both PRAGMA busy_timeout error paths still use the message "failed to enable WAL" instead of "failed to set busy_timeout". Copy-paste oversight. Won't cause bugs but will confuse anyone debugging a busy_timeout failure.
🟢 LOW — err == sql.ErrNoRows vs errors.Is inconsistency
sqlite/asset_repo.go, sqlite/conviction_repo.go, sqlite/intent_fees_repo.go — Some sentinel checks use direct == comparison while others (round_repo, vtxo_repo, settings_repo, scheduled_session_repo) correctly use errors.Is. The withReadQuerier wrapper can join errors via fmt.Errorf("%w: %w", err, closeErr), which would prevent == from matching. Extremely unlikely in practice (closeConn rarely fails on non-interrupt path), but errors.Is is the correct Go idiom. Worth a consistency pass.
🟢 LOW — Postgres AssetAmount still typed as interface{}
postgres/sqlc/queries/query.sql.go — A sqlc column override would make this type-safe. Follow-up item.
✅ Good changes in this commit
- Test error messages now correctly use 1-indexed
completedcounter - Batch interval check simplified from
i != 0 && i%batchInterval == 0tocompleted%batchInterval == 0 - Typo fix "recieved" → "received"
waitForVTXOssignature reformatted for readability
⚠️ PROTOCOL-CRITICAL — Human review still required
This PR refactors the database layer used by vtxo_repo.go and round_repo.go through the withReadQuerier/withWriteQuerier abstraction. These are protocol-critical paths (VTXO lifecycle, round management). While the refactor is sound and well-tested, human sign-off is required before merge per protocol.
🤖 Reviewed by Arkana
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
internal/test/e2e/e2e_test.go (1)
5442-5465:⚠️ Potential issue | 🟠 MajorRetry loop still aborts on the first iteration (previous finding not addressed).
require.NoError(t, err)andrequire.Len(t, dbVtxos.Vtxos, len(outpoints))are called unconditionally inside thefor time.Now().Before(batchInDbDeadline)poll, so the very first iteration that observes the indexer mid-sync will fail the whole test instead of sleeping and retrying — defeating the 5s deadline. Under CI load this will manifest as flakes rather than real bugs in the system under test. Also theallSpent = v.Spentassignment inside the loop only works because of thebreak; making itallSpent = allSpent && v.Spentstates the invariant directly.🛠️ Proposed fix
for time.Now().Before(batchInDbDeadline) { aliceCtx, aliceGetCtxErr = alice.Indexer(). GetCommitmentTx(ctx, aliceSettleRes.CommitmentTxid) bobCtx, bobGetCtxErr = bob.Indexer(). GetCommitmentTx(ctx, bobSettleRes.CommitmentTxid) - dbVtxos, err := alice.Indexer().GetVtxos(ctx, indexer.WithOutpoints(outpoints)) - - require.NoError(t, err) - require.Len(t, dbVtxos.Vtxos, len(outpoints)) + dbVtxos, err := alice.Indexer().GetVtxos(ctx, indexer.WithOutpoints(outpoints)) + if err != nil || len(dbVtxos.Vtxos) != len(outpoints) { + time.Sleep(100 * time.Millisecond) + continue + } allSpent := true for _, v := range dbVtxos.Vtxos { - allSpent = v.Spent - if !allSpent { + allSpent = allSpent && v.Spent + if !allSpent { break } } if aliceGetCtxErr == nil && bobGetCtxErr == nil && allSpent { break } time.Sleep(100 * time.Millisecond) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/test/e2e/e2e_test.go` around lines 5442 - 5465, The retry loop aborts early because require.NoError(t, err) and require.Len(t, dbVtxos.Vtxos, len(outpoints)) are asserted unconditionally inside the polling loop; change the logic in the loop that calls alice.Indexer().GetCommitmentTx / bob.Indexer().GetCommitmentTx and alice.Indexer().GetVtxos to not call require.* until success criteria are met — instead check errors and lengths with if statements (e.g., if err != nil || len(dbVtxos.Vtxos) != len(outpoints) { time.Sleep(...); continue }) so the loop can retry until batchInDbDeadline, and after the loop (once aliceGetCtxErr==nil && bobGetCtxErr==nil && allSpent) call require.NoError and require.Len as assertions; also fix the v.Spent aggregation by replacing allSpent = v.Spent with allSpent = allSpent && v.Spent inside the vtxo iteration to correctly preserve the invariant.
🧹 Nitpick comments (4)
internal/infrastructure/db/sqlite/asset_repo.go (1)
172-200: Considererrors.Is(err, sql.ErrNoRows)for the wrapped-error path.
withReadQuerieronly returns the innererrunchanged whencloseConnsucceeds; on the rare path where connection close also fails, it wraps withfmt.Errorf("%w: %w", err, closeErr), at which pointerr == sql.ErrNoRowsno longer matches. Usingerrors.Ishere (and inGetControlAsset) makes the ErrNoRows branch robust against that wrapping without any behavior change in the happy path.♻️ Proposed change
- }); err != nil { - if err == sql.ErrNoRows { + }); err != nil { + if errors.Is(err, sql.ErrNoRows) { return "", fmt.Errorf("no control asset found") }Analogous change at line 194.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/infrastructure/db/sqlite/asset_repo.go` around lines 172 - 200, Replace direct equality checks against sql.ErrNoRows with errors.Is to handle wrapped errors: in the GetControlAsset flow (where SelectControlAssetByID is called via withReadQuerier) and in AssetExists (where SelectAssetExists is called) change the if branches that currently test `err == sql.ErrNoRows` to use `errors.Is(err, sql.ErrNoRows)`, and add the standard "errors" import; keep all other behavior unchanged.internal/test/e2e/e2e_test.go (2)
5186-5194: Sanity check on the reusedcloseFnidentifier.At line 5191
_, bobEvtCh, closeFn, err := ...reassignscloseFn(onlybobEvtChis newly declared, the rest are in scope from line 5187). Eachdefer closeFn()evaluates the function value at defer-statement time, so the test registers alice's close at line 5189 and bob's close at line 5193 — both will fire in reverse order at test exit. This is correct but subtle; a quick rename (e.g.closeAliceSub/closeBobSub) would make the intent obvious to future readers.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/test/e2e/e2e_test.go` around lines 5186 - 5194, The defer uses the same identifier closeFn for both subscriptions which is legal but confusing; update the two NewSubscription calls so each returns a distinct closer variable (e.g. closeAliceSub and closeBobSub) instead of reusing closeFn, and update the corresponding defer calls to defer closeAliceSub() and defer closeBobSub(); this clarifies the subscriptions created via alice.Indexer().NewSubscription(...) and bob.Indexer().NewSubscription(...) and avoids subtle scope/reassignment confusion.
5268-5288: Classifyingcodes.Internalwith substring"context"is fragile.Matching
strings.Contains(errMsg, "context")against gRPCInternalmessages is a reasonable pragmatic workaround for the server reflecting cancellation as Internal, but it will silently swallow future unrelated Internal errors that happen to mention the word "context" (e.g. any message referring to "user context", "commitment context", etc.). Once the server-side behavior is settled, consider tightening this to a more specific substring (e.g."context canceled"/"context deadline exceeded") so genuine Internal bugs still fail the test.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/test/e2e/e2e_test.go` around lines 5268 - 5288, The test currently treats any gRPC codes.Internal error whose message contains the substring "context" as a cancellation by checking st.Code() == codes.Internal and using strings.Contains(errMsg, "context"); change this to detect specific cancellation phrases instead (e.g., check for "context canceled" or "context deadline exceeded" after lowercasing the message) inside the same Internal branch so only true context cancellations call canceledAssetCalls.Add(1) and continue, while other Internal errors are propagated via the errCh as before.internal/infrastructure/db/sqlite/utils.go (1)
130-188: Pinned-connection transaction / interrupt handling looks correct.Acquiring a dedicated
*sql.Connfor the entire transaction, driving the interrupt-eviction throughconn.Raw(func(any) error { return driver.ErrBadConn }), and special-casingsql.ErrConnDoneon the subsequentClose()is the idiomatic way to forcedatabase/sqlto drop a tainted connection from the pool. The retry loop also correctly re-acquires a fresh connection on every iteration, so a previously-interrupted connection cannot leak into subsequent attempts.One small nit (non-blocking): when
closeConnfails after atxBodyorCommiterror, the returnedcloseErrdiscards the original error.withReadQuerier/withWriteQuerieralready wrap with"%w: %w"— consider doing the same here for symmetry, since the underlying txBody error is usually the more actionable one.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/infrastructure/db/sqlite/utils.go` around lines 130 - 188, The current execTx returns closeConn's error and discards the original txBody or Commit error when closeConn fails; update execTx (referencing execTx, txBody, closeConn, isInterruptError, isConflictError) to preserve the original error by wrapping the closeConn error with the original (e.g., return a wrapped error that includes both the original transaction error and the closeConn error) in both places where you currently do `if closeErr := closeConn(...); closeErr != nil { return closeErr }`, so callers get the actionable tx/commit error plus the close failure details.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/infrastructure/db/sqlite/utils.go`:
- Around line 87-97: The error handling for the busy_timeout PRAGMA is
incorrect: both writeDB.Exec and readDB.Exec wrap failures with "failed to
enable WAL" due to copy/paste; update the fmt.Errorf messages in the
busy_timeout blocks (the calls to writeDB.Exec(`PRAGMA busy_timeout = 5000;`)
and readDB.Exec(`PRAGMA busy_timeout = 5000;`)) to return a clear, specific
error like "failed to set busy_timeout" (including the wrapped err) so logs
accurately reflect the failing PRAGMA operation.
---
Duplicate comments:
In `@internal/test/e2e/e2e_test.go`:
- Around line 5442-5465: The retry loop aborts early because require.NoError(t,
err) and require.Len(t, dbVtxos.Vtxos, len(outpoints)) are asserted
unconditionally inside the polling loop; change the logic in the loop that calls
alice.Indexer().GetCommitmentTx / bob.Indexer().GetCommitmentTx and
alice.Indexer().GetVtxos to not call require.* until success criteria are met —
instead check errors and lengths with if statements (e.g., if err != nil ||
len(dbVtxos.Vtxos) != len(outpoints) { time.Sleep(...); continue }) so the loop
can retry until batchInDbDeadline, and after the loop (once aliceGetCtxErr==nil
&& bobGetCtxErr==nil && allSpent) call require.NoError and require.Len as
assertions; also fix the v.Spent aggregation by replacing allSpent = v.Spent
with allSpent = allSpent && v.Spent inside the vtxo iteration to correctly
preserve the invariant.
---
Nitpick comments:
In `@internal/infrastructure/db/sqlite/asset_repo.go`:
- Around line 172-200: Replace direct equality checks against sql.ErrNoRows with
errors.Is to handle wrapped errors: in the GetControlAsset flow (where
SelectControlAssetByID is called via withReadQuerier) and in AssetExists (where
SelectAssetExists is called) change the if branches that currently test `err ==
sql.ErrNoRows` to use `errors.Is(err, sql.ErrNoRows)`, and add the standard
"errors" import; keep all other behavior unchanged.
In `@internal/infrastructure/db/sqlite/utils.go`:
- Around line 130-188: The current execTx returns closeConn's error and discards
the original txBody or Commit error when closeConn fails; update execTx
(referencing execTx, txBody, closeConn, isInterruptError, isConflictError) to
preserve the original error by wrapping the closeConn error with the original
(e.g., return a wrapped error that includes both the original transaction error
and the closeConn error) in both places where you currently do `if closeErr :=
closeConn(...); closeErr != nil { return closeErr }`, so callers get the
actionable tx/commit error plus the close failure details.
In `@internal/test/e2e/e2e_test.go`:
- Around line 5186-5194: The defer uses the same identifier closeFn for both
subscriptions which is legal but confusing; update the two NewSubscription calls
so each returns a distinct closer variable (e.g. closeAliceSub and closeBobSub)
instead of reusing closeFn, and update the corresponding defer calls to defer
closeAliceSub() and defer closeBobSub(); this clarifies the subscriptions
created via alice.Indexer().NewSubscription(...) and
bob.Indexer().NewSubscription(...) and avoids subtle scope/reassignment
confusion.
- Around line 5268-5288: The test currently treats any gRPC codes.Internal error
whose message contains the substring "context" as a cancellation by checking
st.Code() == codes.Internal and using strings.Contains(errMsg, "context");
change this to detect specific cancellation phrases instead (e.g., check for
"context canceled" or "context deadline exceeded" after lowercasing the message)
inside the same Internal branch so only true context cancellations call
canceledAssetCalls.Add(1) and continue, while other Internal errors are
propagated via the errCh as before.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 82024106-18d1-4c5c-8e96-48cff149e1de
📒 Files selected for processing (7)
internal/infrastructure/db/postgres/asset_repo.gointernal/infrastructure/db/sqlite/asset_repo.gointernal/infrastructure/db/sqlite/sqlc/queries/query.sql.gointernal/infrastructure/db/sqlite/sqlc/query.sqlinternal/infrastructure/db/sqlite/utils.gointernal/test/e2e/e2e_test.gointernal/test/e2e/utils_test.go
✅ Files skipped from review due to trivial changes (1)
- internal/infrastructure/db/sqlite/sqlc/query.sql
🚧 Files skipped from review as they are similar to previous changes (2)
- internal/infrastructure/db/sqlite/sqlc/queries/query.sql.go
- internal/test/e2e/utils_test.go
There was a problem hiding this comment.
Arkana Code Review — #1043 (commits 8b893ce..8c05582)
Reviewer: Arkana (aggressive review mode)
Scope: 4 new commits since last review at ce43d36
Changes look solid. Approving with minor nits.
What changed since last review:
GetSweepTxs→withReadQuerier(round_repo.go:424-430): Consistent with all other read paths. ✅- Options pattern for
OpenDb(utils.go:42-70): Clean functional-options API. WAL + busy_timeout move from runtime PRAGMAs to DSN_pragma=parameters — this is actually more correct because pragmas are now applied per-connection by the driver, not just at pool-open time. New pool connections get them automatically. ✅ - Error preservation in
execTx(utils.go:185,201): Previous code swallowed the original error oncloseConnfailure (return closeErr). Now wraps both:fmt.Errorf("%w: %w", err, closeErr). Good fix. ✅ errors.Isforsql.ErrNoRows(sqlite/asset_repo.go:179,195): Correct — handles wrapped sentinels. ✅- Postgres type simplification (
postgres/asset_repo.go:161-163,query.sql:451):COALESCE(v.asset_amount, 0)::TEXTcoalesces the numeric first then casts — same semantics, but now sqlc generatesstringinstead ofinterface{}, eliminating thestringifyAssetAmounthelper. Clean. ✅ buildDSNfunction (utils.go:133-145): Simple, correct for modernc.org/sqlite v1.33.1 which supports_pragma=DSN params.
Nits (non-blocking):
-
Inconsistent
ErrNoRowscomparison —sqlite/asset_repo.go:67still useserr != sql.ErrNoRows(bare equality). Same pattern exists inpostgres/asset_repo.go:69,174,188. Not introduced by this PR, but since you're already fixing the sqlite side, consider a follow-up to catch all of them. -
No direct unit test for
buildDSN— tested indirectly viaOpenDbcalls, but a table-driven test would be cheap insurance against DSN encoding bugs (e.g. special characters in paths, multiple pragma ordering).
Verdict
Not protocol-critical (no VTXO/signing/forfeit/exit path changes). DB connection hygiene + query optimization only. Previous review concerns are addressed. Ship it.
🤖 Reviewed by Arkana
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
internal/infrastructure/db/sqlite/vtxo_repo.go (1)
259-276:⚠️ Potential issue | 🟡 MinorInconsistent error handling can silently zero out recoverable liquidity.
GetExpiringLiquidity(L249-252) returns an error when the driver returns an unexpected type, butGetRecoverableLiquidityswallows the same condition as(0, nil). Both queries have the same shape (COALESCE(SUM(amount), 0)), so they should handle an unexpected type the same way — otherwise a driver/sqlc quirk that surfaces in one path will silently misreport recoverable liquidity in the other.🛠 Proposed fix
n, ok := amount.(int64) if !ok { - return 0, nil + return 0, fmt.Errorf("unexpected sqlite amount type: %T", amount) } if n < 0 { return 0, fmt.Errorf("data integrity issue: got negative value %d", n) } return uint64(n), nil🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/infrastructure/db/sqlite/vtxo_repo.go` around lines 259 - 276, GetRecoverableLiquidity currently swallows a type-assertion failure (amount.(int64)) and returns (0, nil), which is inconsistent with GetExpiringLiquidity; change GetRecoverableLiquidity to return an error when the driver returns an unexpected type. Locate GetRecoverableLiquidity and the call to SelectRecoverableLiquidityAmount and mirror the behavior of GetExpiringLiquidity: detect when amount is not an int64 and return a descriptive error (e.g., "unexpected type for recoverable liquidity"), keep the negative-value check, and only convert to uint64 on a successful int64 assertion.
🧹 Nitpick comments (1)
internal/infrastructure/db/sqlite/intent_fees_repo.go (1)
37-48: Useerrors.Is(err, sql.ErrNoRows)for consistency and wrap-safety.Direct equality will break the "missing row returns empty
IntentFees" contract ifwithReadQuerierever wraps the error.scheduledSessionRepository.Getalready useserrors.Isfor the same case — keep it consistent.♻️ Proposed change
import ( "context" "database/sql" + "errors" "fmt" ... )}); err != nil { - if err == sql.ErrNoRows { + if errors.Is(err, sql.ErrNoRows) { return &domain.IntentFees{}, nil } return nil, fmt.Errorf("failed to get intent fees: %w", err) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/infrastructure/db/sqlite/intent_fees_repo.go` around lines 37 - 48, In GetIntentFees on intentFeesRepo, replace the direct equality check (if err == sql.ErrNoRows) with a wrap-safe check using errors.Is(err, sql.ErrNoRows) so that errors wrapped by withReadQuerier are still recognized as “no rows” and return an empty domain.IntentFees; update the import usage if needed to ensure errors is available and keep the rest of the error wrapping (fmt.Errorf) unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@internal/infrastructure/db/sqlite/vtxo_repo.go`:
- Around line 259-276: GetRecoverableLiquidity currently swallows a
type-assertion failure (amount.(int64)) and returns (0, nil), which is
inconsistent with GetExpiringLiquidity; change GetRecoverableLiquidity to return
an error when the driver returns an unexpected type. Locate
GetRecoverableLiquidity and the call to SelectRecoverableLiquidityAmount and
mirror the behavior of GetExpiringLiquidity: detect when amount is not an int64
and return a descriptive error (e.g., "unexpected type for recoverable
liquidity"), keep the negative-value check, and only convert to uint64 on a
successful int64 assertion.
---
Nitpick comments:
In `@internal/infrastructure/db/sqlite/intent_fees_repo.go`:
- Around line 37-48: In GetIntentFees on intentFeesRepo, replace the direct
equality check (if err == sql.ErrNoRows) with a wrap-safe check using
errors.Is(err, sql.ErrNoRows) so that errors wrapped by withReadQuerier are
still recognized as “no rows” and return an empty domain.IntentFees; update the
import usage if needed to ensure errors is available and keep the rest of the
error wrapping (fmt.Errorf) unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 11cfe9eb-62a3-4492-b702-455aace5520c
📒 Files selected for processing (19)
internal/infrastructure/db/postgres/asset_repo.gointernal/infrastructure/db/postgres/sqlc/queries/query.sql.gointernal/infrastructure/db/postgres/sqlc/query.sqlinternal/infrastructure/db/service.gointernal/infrastructure/db/service_test.gointernal/infrastructure/db/sqlite/asset_repo.gointernal/infrastructure/db/sqlite/cancellation_test.gointernal/infrastructure/db/sqlite/conviction_repo.gointernal/infrastructure/db/sqlite/intent_fees_repo.gointernal/infrastructure/db/sqlite/intent_txid_migration_test.gointernal/infrastructure/db/sqlite/offchain_tx_repo.gointernal/infrastructure/db/sqlite/round_repo.gointernal/infrastructure/db/sqlite/scheduled_session_repo.gointernal/infrastructure/db/sqlite/sqlc/queries/query.sql.gointernal/infrastructure/db/sqlite/sqlc/query.sqlinternal/infrastructure/db/sqlite/utils.gointernal/infrastructure/db/sqlite/vtxo_repo.gointernal/test/e2e/e2e_test.gointernal/test/e2e/utils_test.go
✅ Files skipped from review due to trivial changes (1)
- internal/infrastructure/db/sqlite/conviction_repo.go
🚧 Files skipped from review as they are similar to previous changes (7)
- internal/infrastructure/db/postgres/sqlc/query.sql
- internal/infrastructure/db/service_test.go
- internal/infrastructure/db/sqlite/offchain_tx_repo.go
- internal/infrastructure/db/sqlite/sqlc/queries/query.sql.go
- internal/infrastructure/db/sqlite/intent_txid_migration_test.go
- internal/infrastructure/db/sqlite/asset_repo.go
- internal/infrastructure/db/postgres/asset_repo.go
There was a problem hiding this comment.
Arkana Code Review — #1043 (commit ebf5df7)
Reviewer: Arkana (aggressive review mode)
New commit: ebf5df7c — chore: golangci-lint fmt
Formatting-only change (line wrapping in round_repo.go, utils.go, vtxo_repo.go). No logic changes. ✅ No concerns with this commit.
Prior findings still open
Two items from my previous reviews remain unaddressed:
-
vtxo_repo.go:269-271—GetRecoverableLiquiditysilently swallows type-assertion failures.
Whenamount.(int64)fails, it returns(0, nil)— hiding a data/driver bug as a zero balance.GetExpiringLiquiditycorrectly returns an error in this case. These should be consistent. Suggested fix:n, ok := amount.(int64) if !ok { return 0, fmt.Errorf("unexpected type %T for recoverable liquidity amount", amount) }
-
intent_fees_repo.go:47—err == sql.ErrNoRowsshould beerrors.Is(err, sql.ErrNoRows).
withReadQueriermay wrap errors, breaking the direct equality check. This would cause "no rows" to surface as a hard error instead of returning an emptyIntentFees.
Neither is a ship-blocker but both are latent bugs that will bite under the right conditions. Would be good to address before merge.
🤖 Reviewed by Arkana
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Chris Ricketts <6156768+chris-ricketts@users.noreply.github.com>
WAL mode not supported in DSN for in-memory databases used in tests
0251ddb to
a700225
Compare
There was a problem hiding this comment.
Arkana Code Review — #1043 (commits 21df1f7..a700225)
Reviewer: Arkana (aggressive review mode)
New commits since last review
21df1f73 — fix: rabbit / arkana comments
-
intent_fees_repo.go:47—errors.Is(err, sql.ErrNoRows)replaces bare==check. ✅ Correct —withReadQueriermay wrap errors, soerrors.Isis required for reliable sentinel matching. -
vtxo_repo.go:270—GetRecoverableLiquiditynow returns an error on unexpected type assertion instead of silently returning 0. ✅ Good — silent nil on type mismatch was a latent bug that could mask data corruption. -
e2e_test.go— Fixed indentation inTestUnrolledVtxoRejoinBatch(was mixing tabs/spaces insidet.Run). Pure formatting, no logic change. ✅
3b1e901e — chore: fmt
Formatting only. ✅
a700225c — ci: increase TestGetAssetQueryChurn timeouts/deadlines
recvVtxosTimeout: 10s → 20svtxosInDBDeadline: 5s → 10sbatchInDbDeadline: 5s → 10s
Reasonable for CI flake reduction. ✅
Verdict
All three commits address prior review feedback cleanly. No new logic, no protocol-critical changes, no cross-repo impact. LGTM.
🤖 Reviewed by Arkana
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
internal/infrastructure/db/sqlite/vtxo_repo.go (1)
558-597:⚠️ Potential issue | 🟡 MinorMinor: duplicate outpoints in input would now produce duplicate vtxos.
The previous call graph collected rows first and ran them through
readRows(which de-duplicates bytxid:voutvia its internal map); this version callsreadRowsper outpoint and appends, so ifoutpointsever contained duplicates they would no longer be collapsed. In practice callers pass distinct outpoints so this is unlikely to bite, but worth a defensive de-dup of the input or a map-based accumulator here if duplicates are possible.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/infrastructure/db/sqlite/vtxo_repo.go` around lines 558 - 597, GetPendingSpentVtxosWithOutpoints currently calls readRows per outpoint which lets duplicate outpoints produce duplicate vtxos; change it to deduplicate by txid:vout before querying or accumulate results into a map keyed by txid:vout after each readRows call to ensure unique vtxos. Specifically, inside GetPendingSpentVtxosWithOutpoints (and around the loop that calls queries.SelectPendingSpentVtxo and readRows), either 1) build a set of unique outpoints (using Txid+Vout as the key) and iterate that set, or 2) keep a map[string]domain.Vtxo (keyed by fmt.Sprintf("%s:%d", outpoint.Txid, outpoint.VOut)) and insert results from readRows into the map, then convert the map values to the vtxos slice before sorting; this preserves behavior of readRows while preventing duplicates.
♻️ Duplicate comments (1)
internal/test/e2e/e2e_test.go (1)
5711-5734:⚠️ Potential issue | 🟠 MajorRetry loop still short-circuits on the first transient miss — same issue as before.
require.NoError(t, err)andrequire.Len(t, dbVtxos.Vtxos, len(outpoints))on every iteration make the 10s deadline meaningless — a single early poll where the indexer hasn't caught up fails the whole test. TheallSpent = v.Spentassignment also still overwrites the previous result each iteration rather than accumulating an AND, relying onbreaksemantics to end up with the right value.This is the same structural issue I flagged on a previous commit at these lines; it doesn't appear to have been addressed in this revision.
🛠 Suggested fix
- for time.Now().Before(batchInDbDeadline) { - aliceCtx, aliceGetCtxErr = alice.Indexer(). - GetCommitmentTx(ctx, aliceSettleRes.CommitmentTxid) - bobCtx, bobGetCtxErr = bob.Indexer(). - GetCommitmentTx(ctx, bobSettleRes.CommitmentTxid) - - dbVtxos, err := alice.Indexer().GetVtxos(ctx, indexer.WithOutpoints(outpoints)) - - require.NoError(t, err) - require.Len(t, dbVtxos.Vtxos, len(outpoints)) - - allSpent := true - for _, v := range dbVtxos.Vtxos { - allSpent = v.Spent - if !allSpent { - break - } - } - - if aliceGetCtxErr == nil && bobGetCtxErr == nil && allSpent { - break - } - time.Sleep(100 * time.Millisecond) - } + for time.Now().Before(batchInDbDeadline) { + aliceCtx, aliceGetCtxErr = alice.Indexer(). + GetCommitmentTx(ctx, aliceSettleRes.CommitmentTxid) + bobCtx, bobGetCtxErr = bob.Indexer(). + GetCommitmentTx(ctx, bobSettleRes.CommitmentTxid) + + dbVtxos, getVtxosErr := alice.Indexer(). + GetVtxos(ctx, indexer.WithOutpoints(outpoints)) + if getVtxosErr != nil || len(dbVtxos.Vtxos) != len(outpoints) { + time.Sleep(100 * time.Millisecond) + continue + } + + allSpent := true + for _, v := range dbVtxos.Vtxos { + if !v.Spent { + allSpent = false + break + } + } + + if aliceGetCtxErr == nil && bobGetCtxErr == nil && allSpent { + break + } + time.Sleep(100 * time.Millisecond) + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/test/e2e/e2e_test.go` around lines 5711 - 5734, The retry loop currently fails fast because require.NoError and require.Len are called inside the loop and the allSpent assignment overwrites rather than accumulates; change it to retry on transient misses by removing require.NoError and require.Len from inside the loop and instead capture errors/length into locals (e.g., aliceGetCtxErr, bobGetCtxErr, dbVtxosErr, gotLen) and only assert after the loop finishes or the success condition is met; compute allSpent by accumulating with AND (allSpent = allSpent && v.Spent) when iterating dbVtxos.Vtxos, and only break when alice.Indexer().GetCommitmentTx and bob.Indexer().GetCommitmentTx both return nil and allSpent is true, then call require.NoError and require.Len once after the loop to fail the test if the deadline hit.
🧹 Nitpick comments (3)
internal/infrastructure/db/sqlite/intent_txid_migration_test.go (1)
16-17: Shared in-memory DSN — correct choice, with a small caveat.
file::memory:?cache=sharedis the right pattern so the separate read/write*sql.DBpools see the same in-memory database. Just be aware that any other test in this package using the same unadornedfile::memory:URI would share state across tests in the same binary. If more tests adopt this pattern later, consider giving each test a unique in-memory name, e.g.file:testname_%d?mode=memory&cache=shared, to avoid cross-test contamination.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/infrastructure/db/sqlite/intent_txid_migration_test.go` around lines 16 - 17, The DSN used for the shared in-memory SQLite must include the cache parameter so separate *sql.DB pools see the same DB; update the call to sqlitedb.OpenDb (and any tests using the same pattern) to use "file::memory:?cache=shared" (or a per-test unique name like "file:testname_%d?mode=memory&cache=shared" to avoid cross-test contamination) and keep the sqlitedb.WithSharedCache() option as-is when creating the DB.internal/infrastructure/db/sqlite/utils.go (1)
269-323: Connection discard + interrupt classification look correct.
closeConnuses the documentedconn.Rawidiom to force the pool to treat the underlying driver connection as bad (driver.ErrBadConn), and the subsequentsql.ErrConnDonefrom the handle'sCloseis correctly swallowed whendiscardis set.isInterruptErrorhandlescontext.Canceled,context.DeadlineExceeded, and*sqlite.Errorwith the SQLite interrupt code.One tiny nit in
isInterruptError:code == sqlite3.SQLITE_INTERRUPT || code&0xff == sqlite3.SQLITE_INTERRUPT— the first disjunct is always subsumed by the second (the low byte of the primary result code equals the primary code itself), so the first can be dropped. Purely cosmetic.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/infrastructure/db/sqlite/utils.go` around lines 269 - 323, In isInterruptError, remove the redundant comparison "code == sqlite3.SQLITE_INTERRUPT" and keep only the low-byte check "code&0xff == sqlite3.SQLITE_INTERRUPT" to simplify the condition; update the return in isInterruptError to use only the latter check (refer to the isInterruptError function and the sqliteErr.Code() usage) and run tests to confirm no behavior change.internal/test/e2e/e2e_test.go (1)
5643-5657: Abortiverequire.NoErrorinside a polling loop.Same anti-pattern as the batch-DB polling loop a few blocks down, just milder: a single transient
GetVtxoserror inside the 10s wait kills the test instead of triggering a retry. Under the stress this test intentionally produces (WAL churn + cancellations driving the very codepath under test), transient cancellation/interrupt errors are exactly what you want to tolerate here rather than fail on.🛠 Suggested fix
for time.Now().Before(vtxosInDBDeadline) { - res, err := alice.Indexer(). + res, getErr := alice.Indexer(). GetVtxos(ctx, indexer.WithOutpoints(outpoints)) - require.NoError(t, err) - - if len(res.Vtxos) == len(outpoints) { - vtxos := res.Vtxos - for _, v := range vtxos { + if getErr == nil && len(res.Vtxos) == len(outpoints) { + for _, v := range res.Vtxos { dbVtxos[v.Outpoint] = v } break } time.Sleep(100 * time.Millisecond) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/test/e2e/e2e_test.go` around lines 5643 - 5657, The polling loop around vtxosInDBDeadline aborts the whole test on any transient error because it calls require.NoError(t, err); change this to tolerate transient errors by replacing require.NoError with logic that logs/ignores transient errors and continues retrying until deadline or a non-transient failure: call alice.Indexer().GetVtxos(ctx, indexer.WithOutpoints(outpoints)) inside the same loop, check err and if it represents a transient cancellation/interrupt (or context.Canceled/context.DeadlineExceeded) then continue after the time.Sleep, otherwise fail the test via require.NoError or require.FailNow; keep the existing success path that fills dbVtxos when len(res.Vtxos)==len(outpoints).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@internal/infrastructure/db/sqlite/vtxo_repo.go`:
- Around line 558-597: GetPendingSpentVtxosWithOutpoints currently calls
readRows per outpoint which lets duplicate outpoints produce duplicate vtxos;
change it to deduplicate by txid:vout before querying or accumulate results into
a map keyed by txid:vout after each readRows call to ensure unique vtxos.
Specifically, inside GetPendingSpentVtxosWithOutpoints (and around the loop that
calls queries.SelectPendingSpentVtxo and readRows), either 1) build a set of
unique outpoints (using Txid+Vout as the key) and iterate that set, or 2) keep a
map[string]domain.Vtxo (keyed by fmt.Sprintf("%s:%d", outpoint.Txid,
outpoint.VOut)) and insert results from readRows into the map, then convert the
map values to the vtxos slice before sorting; this preserves behavior of
readRows while preventing duplicates.
---
Duplicate comments:
In `@internal/test/e2e/e2e_test.go`:
- Around line 5711-5734: The retry loop currently fails fast because
require.NoError and require.Len are called inside the loop and the allSpent
assignment overwrites rather than accumulates; change it to retry on transient
misses by removing require.NoError and require.Len from inside the loop and
instead capture errors/length into locals (e.g., aliceGetCtxErr, bobGetCtxErr,
dbVtxosErr, gotLen) and only assert after the loop finishes or the success
condition is met; compute allSpent by accumulating with AND (allSpent = allSpent
&& v.Spent) when iterating dbVtxos.Vtxos, and only break when
alice.Indexer().GetCommitmentTx and bob.Indexer().GetCommitmentTx both return
nil and allSpent is true, then call require.NoError and require.Len once after
the loop to fail the test if the deadline hit.
---
Nitpick comments:
In `@internal/infrastructure/db/sqlite/intent_txid_migration_test.go`:
- Around line 16-17: The DSN used for the shared in-memory SQLite must include
the cache parameter so separate *sql.DB pools see the same DB; update the call
to sqlitedb.OpenDb (and any tests using the same pattern) to use
"file::memory:?cache=shared" (or a per-test unique name like
"file:testname_%d?mode=memory&cache=shared" to avoid cross-test contamination)
and keep the sqlitedb.WithSharedCache() option as-is when creating the DB.
In `@internal/infrastructure/db/sqlite/utils.go`:
- Around line 269-323: In isInterruptError, remove the redundant comparison
"code == sqlite3.SQLITE_INTERRUPT" and keep only the low-byte check "code&0xff
== sqlite3.SQLITE_INTERRUPT" to simplify the condition; update the return in
isInterruptError to use only the latter check (refer to the isInterruptError
function and the sqliteErr.Code() usage) and run tests to confirm no behavior
change.
In `@internal/test/e2e/e2e_test.go`:
- Around line 5643-5657: The polling loop around vtxosInDBDeadline aborts the
whole test on any transient error because it calls require.NoError(t, err);
change this to tolerate transient errors by replacing require.NoError with logic
that logs/ignores transient errors and continues retrying until deadline or a
non-transient failure: call alice.Indexer().GetVtxos(ctx,
indexer.WithOutpoints(outpoints)) inside the same loop, check err and if it
represents a transient cancellation/interrupt (or
context.Canceled/context.DeadlineExceeded) then continue after the time.Sleep,
otherwise fail the test via require.NoError or require.FailNow; keep the
existing success path that fills dbVtxos when len(res.Vtxos)==len(outpoints).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 2fb1a897-862d-4b84-bba9-d906ce8c0211
📒 Files selected for processing (19)
internal/infrastructure/db/postgres/asset_repo.gointernal/infrastructure/db/postgres/sqlc/queries/query.sql.gointernal/infrastructure/db/postgres/sqlc/query.sqlinternal/infrastructure/db/service.gointernal/infrastructure/db/service_test.gointernal/infrastructure/db/sqlite/asset_repo.gointernal/infrastructure/db/sqlite/cancellation_test.gointernal/infrastructure/db/sqlite/conviction_repo.gointernal/infrastructure/db/sqlite/intent_fees_repo.gointernal/infrastructure/db/sqlite/intent_txid_migration_test.gointernal/infrastructure/db/sqlite/offchain_tx_repo.gointernal/infrastructure/db/sqlite/round_repo.gointernal/infrastructure/db/sqlite/scheduled_session_repo.gointernal/infrastructure/db/sqlite/sqlc/queries/query.sql.gointernal/infrastructure/db/sqlite/sqlc/query.sqlinternal/infrastructure/db/sqlite/utils.gointernal/infrastructure/db/sqlite/vtxo_repo.gointernal/test/e2e/e2e_test.gointernal/test/e2e/utils_test.go
✅ Files skipped from review due to trivial changes (1)
- internal/infrastructure/db/sqlite/cancellation_test.go
🚧 Files skipped from review as they are similar to previous changes (9)
- internal/infrastructure/db/service_test.go
- internal/infrastructure/db/postgres/sqlc/queries/query.sql.go
- internal/test/e2e/utils_test.go
- internal/infrastructure/db/sqlite/intent_fees_repo.go
- internal/infrastructure/db/postgres/sqlc/query.sql
- internal/infrastructure/db/sqlite/offchain_tx_repo.go
- internal/infrastructure/db/sqlite/scheduled_session_repo.go
- internal/infrastructure/db/sqlite/round_repo.go
- internal/infrastructure/db/sqlite/asset_repo.go
| if len(res.Vtxos) == len(outpoints) { | ||
| vtxos := res.Vtxos | ||
| for _, v := range vtxos { | ||
| dbVtxos[v.Outpoint] = v | ||
| } | ||
| break | ||
| } |
There was a problem hiding this comment.
the polling loop stops as soon as it sees all VTXOs in the db, but at that point the spent-marking write may not have finished yet, which i think is what's causing the flaky "failed to update spent vtxo in db" in CI. would it make sense to make the loop wait for both the count and the spent state before breaking?
|
|
||
| tx, err := conn.BeginTx(ctx, nil) | ||
| if err != nil { | ||
| _ = closeConn(conn, isInterruptError(ctx, err)) |
There was a problem hiding this comment.
looks good overall! one thing i noticed since the write pool only has 1 connection, if closeConn fails silently after a BeginTx error that connection is never returned, and every future write blocks forever. what do you think about at least logging the error here instead of dropping it with _?
Summary
Testing
Summary by CodeRabbit
Bug Fixes
Performance
Reliability
Tests