Performance improvements in holding calculation pipeline#1579
Performance improvements in holding calculation pipeline#1579wps260 wants to merge 4 commits intowe-promise:mainfrom
Conversation
|
| Dimension | Score | What it measures |
|---|---|---|
| Identity | 25 | Account age, contribution history, GPG keys, org memberships |
| Behavior | 90 | PR patterns, unsolicited contribution ratio, activity cadence |
| Content | 100 | PR body substance, issue linkage, contribution quality |
| Graph | 30 | Cross-repo trust, co-contributor relationships |
Analyzed by Brin · Full profile
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (6)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughReplace repeated linear scans with indexed lookups, switch array accumulation from Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 7/8 reviews remaining, refill in 7 minutes and 30 seconds.Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (3)
test/models/holding/reverse_calculator_test.rb (1)
189-190: Consider tolerant numeric assertions for cost-basis checks.Using
assert_in_deltahere would make tests less brittle if value representation/precision changes (e.g., BigDecimal/float conversion paths).Optional test hardening example
- assert_equal 110.0, cost_basis_for(calc, security, second_buy) - assert_equal 110.0, cost_basis_for(calc, security, Date.current) + assert_in_delta 110.0, cost_basis_for(calc, security, second_buy).to_f, 1e-6 + assert_in_delta 110.0, cost_basis_for(calc, security, Date.current).to_f, 1e-6Also applies to: 203-209, 221-222, 235-237
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/models/holding/reverse_calculator_test.rb` around lines 189 - 190, Replace strict equality assertions for cost-basis values with tolerant numeric assertions: change assert_equal 100.0, cost_basis_for(calc, security, buy_date) and the other similar assertions in reverse_calculator_test.rb to use assert_in_delta with a small delta (e.g., 0.01) so tests tolerate minor float/BigDecimal precision differences; update every occurrence that compares expected numeric cost-basis (calls to cost_basis_for or similar in this test) to use assert_in_delta(expected, actual, delta) with a consistent delta value.app/models/holding/materializer.rb (1)
173-185: Keep the dedupe in SQL.
pluck(:security_id).uniqstill materializes every duplicate ID in Ruby.distinct.pluck(:security_id)preserves the behavior with less memory/GC churn.Proposed tweak
- portfolio_security_ids = account.trades.pluck(:security_id).uniq + portfolio_security_ids = account.trades.distinct.pluck(:security_id)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/models/holding/materializer.rb` around lines 173 - 185, In purge_stale_holdings replace the in-Ruby dedupe (account.trades.pluck(:security_id).uniq) with a SQL-level distinct to avoid materializing duplicates; locate the portfolio_security_ids assignment in the purge_stale_holdings method and use account.trades.distinct.pluck(:security_id) so the query returns unique security IDs directly from the database.app/models/holding/portfolio_cache.rb (1)
17-22: Return a copy of the cached trades.
group_bystores the arrays inside the memoized hash, so returning them directly makesget_tradesexternally mutable. That is a behavior change from the previous fresh-array return and can poison the cache if callers mutate the result.Proposed tweak
- trades_by_date[date] || [] + trades_by_date[date]&.dup || []🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/models/holding/portfolio_cache.rb` around lines 17 - 22, get_trades currently returns arrays that live inside the memoized trades_by_date, making the cache mutable; fix get_trades by returning fresh copies instead of the cached arrays: when date is blank return trades.to_a (or trades.dup) to produce a new array, and when returning trades_by_date[date] return (trades_by_date[date] || []).dup so callers cannot mutate the cached arrays; update the get_trades method to use these copies while keeping the existing logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@app/models/holding/materializer.rb`:
- Around line 173-185: In purge_stale_holdings replace the in-Ruby dedupe
(account.trades.pluck(:security_id).uniq) with a SQL-level distinct to avoid
materializing duplicates; locate the portfolio_security_ids assignment in the
purge_stale_holdings method and use account.trades.distinct.pluck(:security_id)
so the query returns unique security IDs directly from the database.
In `@app/models/holding/portfolio_cache.rb`:
- Around line 17-22: get_trades currently returns arrays that live inside the
memoized trades_by_date, making the cache mutable; fix get_trades by returning
fresh copies instead of the cached arrays: when date is blank return trades.to_a
(or trades.dup) to produce a new array, and when returning trades_by_date[date]
return (trades_by_date[date] || []).dup so callers cannot mutate the cached
arrays; update the get_trades method to use these copies while keeping the
existing logic.
In `@test/models/holding/reverse_calculator_test.rb`:
- Around line 189-190: Replace strict equality assertions for cost-basis values
with tolerant numeric assertions: change assert_equal 100.0,
cost_basis_for(calc, security, buy_date) and the other similar assertions in
reverse_calculator_test.rb to use assert_in_delta with a small delta (e.g.,
0.01) so tests tolerate minor float/BigDecimal precision differences; update
every occurrence that compares expected numeric cost-basis (calls to
cost_basis_for or similar in this test) to use assert_in_delta(expected, actual,
delta) with a consistent delta value.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 1823688f-d2f2-4158-90e1-ccbf02ba6d59
📒 Files selected for processing (6)
app/models/holding/forward_calculator.rbapp/models/holding/gapfillable.rbapp/models/holding/materializer.rbapp/models/holding/portfolio_cache.rbapp/models/holding/reverse_calculator.rbtest/models/holding/reverse_calculator_test.rb
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
app/models/holding/portfolio_cache.rb (1)
89-93: Reuse the new security indexes here too.This path still walks
holdingsdirectly to derive ids, thenload_pricesgroups the same collection again a few lines later. Pulling ids fromholdings_by_security_id.keys(and similarly fromtrades_by_security_id.keys) would keep initialization closer to single-pass on large histories.As per coding guidelines "Optimize database queries with proper indexes and prevent N+1 queries via includes/joins".Possible refactor
def collect_unique_securities - unique_securities_from_trades = trades.map(&:entryable).map(&:security).uniq - unique_securities_from_trades = unique_securities_from_trades.select { |s| `@security_ids.include`?(s.id) } if `@security_ids` - - return unique_securities_from_trades unless use_holdings - - holding_security_ids = holdings.map(&:security_id).uniq - holding_security_ids = holding_security_ids.select { |id| `@security_ids.include`?(id) } if `@security_ids` - unique_securities_from_holdings = Security.where(id: holding_security_ids).to_a - - (unique_securities_from_trades + unique_securities_from_holdings).uniq + ids = trades_by_security_id.keys + ids |= holdings_by_security_id.keys if use_holdings + ids &= `@security_ids` if `@security_ids` + + Security.where(id: ids).to_a end🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/models/holding/portfolio_cache.rb` around lines 89 - 93, Replace the direct iteration over holdings/trades to derive security ids with the precomputed index keys to avoid multiple passes: use holdings_by_security_id.keys (and trades_by_security_id.keys) as the source for holding_security_ids/trade_security_ids, apply the existing filter (select by `@security_ids`) and uniq logic, then query Security.where(id: ...) as before; update any downstream uses (e.g., load_prices grouping) to rely on these indexed keys so the collection is initialized in a single pass and avoids N+1 behavior when building unique_securities_from_holdings/unique_securities_from_trades.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@app/models/holding/portfolio_cache.rb`:
- Around line 29-30: The RuboCop error is caused by the nested array literal
used as a hash key in the expression security[:prices_by_date_and_source][[date,
source]]; fix it by adding spaces inside the inner array so it becomes
security[:prices_by_date_and_source][[ date, source ]], updating the assignment
to price_with_priority to use the spaced inner array and re-run the linter to
confirm the nested array literal spacing rule is satisfied.
---
Nitpick comments:
In `@app/models/holding/portfolio_cache.rb`:
- Around line 89-93: Replace the direct iteration over holdings/trades to derive
security ids with the precomputed index keys to avoid multiple passes: use
holdings_by_security_id.keys (and trades_by_security_id.keys) as the source for
holding_security_ids/trade_security_ids, apply the existing filter (select by
`@security_ids`) and uniq logic, then query Security.where(id: ...) as before;
update any downstream uses (e.g., load_prices grouping) to rely on these indexed
keys so the collection is initialized in a single pass and avoids N+1 behavior
when building unique_securities_from_holdings/unique_securities_from_trades.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 66efb3a5-dc44-4c6a-b516-3fc0b4d567e9
📒 Files selected for processing (1)
app/models/holding/portfolio_cache.rb
There was a problem hiding this comment.
🧹 Nitpick comments (1)
app/models/holding/portfolio_cache.rb (1)
109-112: Consider selecting only needed DB price columns to reduce memory footprint.Since this path can load many records, narrowing selected fields can further reduce allocation pressure.
Proposed refinement
db_prices_by_security_id = Security::Price .where(security_id: security_ids, date: account.start_date..Date.current) + .select(:security_id, :date, :price, :currency) .group_by(&:security_id)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/models/holding/portfolio_cache.rb` around lines 109 - 112, The query building db_prices_by_security_id in portfolio_cache.rb currently loads full Security::Price records into memory; change the query to only select the columns you actually need (e.g., security_id plus date and price or whatever fields downstream logic in the PortfolioCache class expects) by using select or pluck on Security::Price before grouping so you return lightweight structs/hashes rather than full AR objects; ensure downstream code that uses db_prices_by_security_id still accesses the selected fields (adjust accessors if you switch to arrays/hashes).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@app/models/holding/portfolio_cache.rb`:
- Around line 109-112: The query building db_prices_by_security_id in
portfolio_cache.rb currently loads full Security::Price records into memory;
change the query to only select the columns you actually need (e.g., security_id
plus date and price or whatever fields downstream logic in the PortfolioCache
class expects) by using select or pluck on Security::Price before grouping so
you return lightweight structs/hashes rather than full AR objects; ensure
downstream code that uses db_prices_by_security_id still accesses the selected
fields (adjust accessors if you switch to arrays/hashes).
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 2646bf42-9968-486e-a4a8-67ec900ba7d7
📒 Files selected for processing (1)
app/models/holding/portfolio_cache.rb
|
Pushed a small follow-up commit addressing the open review nits I agreed with:
I also did a quick syntax pass locally. I couldn't run the full Rails test/lint suite in this environment because the local Ruby/Bundler here is 3.1 while the repo expects 3.4, so Bundler aborts before boot. |
1b2d7c1 to
4ba8bd5
Compare
Investment accounts with large histories were pegging CPU at 100% during
sync. Root cause was a cluster of quadratic and superlinear algorithms in
the inner holding calculation loop. All are replaced with O(1) hash lookups
built from single-pass indexes over the already-loaded data.
Holding::PortfolioCache - load_prices:
Three O(SxN) patterns inside the per-security loop:
1. DB prices: `security.prices.where(...)` fired one SQL query per
security (N+1). Replaced with a single bulk query before the loop:
Security::Price.where(security_id: ..., date: ...).group_by(&:security_id)
70 securities -> 70 queries becomes 1.
2. Trade prices: `trades.select { |t| t.entryable.security_id == id }`
scanned the full trades array for every security - O(SxT). Replaced
with trades_by_security_id, pre-indexed once from the loaded array.
3. Holding prices: `holdings.select { |h| h.security_id == id }` - same
O(SxH) pattern. Replaced with holdings_by_security_id.
Prices are now indexed into prices_by_date and prices_by_date_and_source
hashes during load_prices, making get_price O(1) instead of scanning the
flat prices array on every lookup.
Holding::PortfolioCache - get_trades / get_price:
- get_trades(date:): `trades.select { |t| t.date == date }` (O(T) scan)
replaced with trades_by_date hash (O(1)).
- get_price: two `prices.select { p.date == date ... }.min_by` linear
scans replaced with direct hash lookups into prices_by_date and
prices_by_date_and_source.
Holding::PortfolioCache - collect_unique_securities:
`holdings.map(&:security)` traversed the security association on every
holding record (N+1 if not preloaded). Replaced with a pluck of
security_ids followed by a single Security.where(id: ...) batch load.
Holding::ForwardCalculator / ReverseCalculator:
`holdings += build_holdings(...)` allocated a new array copy on every
iteration - O(N) per day x thousands of days = O(D^2) total allocations.
Replaced with holdings.concat(...) which appends in place, O(1).
Holding::ReverseCalculator - precompute_cost_basis:
Old: walked every date from account.start_date to Date.current (O(D)),
writing a cost_basis entry for every security on every date. For an
account with 2 trades over 9,250 days this wrote ~18,500 hash entries
and consumed the full date range in the outer loop regardless of trade
density.
New: walks only buy trades (O(T)), appending one [date, avg_cost]
snapshot per trade. cost_basis_for binary-searches the sparse snapshot
array - O(log T) per lookup. Memory drops from O(DxS) to O(T).
Holding::Gapfillable:
`security_holdings.find { |h| h.date == date }` was called on every
date in the gapfill range - O(H) per date, O(HxD) total. Replaced with
security_holdings.index_by(:date) built once before the loop, making
each date lookup O(1).
Holding::Materializer - purge_stale_holdings:
`account.entries.trades.map { |entry| entry.entryable.security_id }.uniq`
loaded all trade entry records into Ruby then traversed the entryable
association on each (N+1). Replaced with account.trades.pluck(:security_id).uniq
(single SQL query returning only the IDs).
In testing, these changes were able to reduce sync time of an account with
25 years of history and 70 securities from about 90 minutes down to under
3 minutes.
* return dup'd arrays from PortfolioCache#get_trades so callers can't mutate memoized cache state * use the precomputed security-id indexes in collect_unique_securities * keep security-id dedupe in SQL via distinct.pluck(:security_id) * tighten the DB price preload to select only needed columns * harden cost-basis assertions with assert_in_delta
4ba8bd5 to
84104a4
Compare
|
Pulled the sure-admin changes into my branch. All tests pass. |
|
Ah, perfect! It was a "trigger-happy" Thank you. 🙏 |
|
Actually, here's more food for thought: This is a high-quality, well-reasoned performance PR with a dramatic real-world impact. The algorithmic changes are correct and the cost-basis tests are solid. Three items worth addressing before merge:
|
|
Full Sonnet 4.6 review: PR #1579 — Performance improvements in holding calculation pipelineOverviewThis PR replaces a cluster of quadratic/superlinear algorithms in the holding sync pipeline with pre-built hash indexes and a single bulk DB query. The author reports 90 min → <3 min for a 25-year, 70-security account. The changes are focused and well-justified. Code Quality & Correctness
The N+1 elimination (70 SQL queries → 1) is the biggest single win. However, the bulk query uses a partial Security::Price
.where(security_id: security_ids, date: account.start_date..Date.current)
.select(:security_id, :date, :price, :currency)This drops
- portfolio_security_ids = account.entries.trades.map { |entry| entry.entryable.security_id }.uniq
+ portfolio_security_ids = account.trades.distinct.pluck(:security_id)This assumes
The snapshot + binary-search approach is correct, but there's a subtle edge case worth noting: two buy trades on the same date produce two snapshots for that date. The binary search scans The hand-rolled binary search is fine. An alternative is
def get_trades(date: nil)
if date.blank?
trades.dup # added
else
trades_by_date[date]&.dup || [] # added
end
endThe PerformanceAll algorithmic changes are correct and the PR description's O() analysis matches the code. A few secondary notes:
Test CoverageGood:
Missing:
Existing Review CommentsThe RuboCop |
Investment accounts with large histories were pegging CPU at 100% during sync. Root cause was a cluster of quadratic and superlinear algorithms in the inner holding calculation loop. All are replaced with O(1) hash lookups built from single-pass indexes over the already-loaded data.
Holding::PortfolioCache - load_prices:
Three O(SxN) patterns inside the per-security loop:
DB prices:
security.prices.where(...)fired one SQL query per security (N+1). Replaced with a single bulk query before the loop:Security::Price.where(security_id: ..., date: ...).group_by(&:security_id)
70 securities -> 70 queries becomes 1.
Trade prices:
trades.select { |t| t.entryable.security_id == id }scanned the full trades array for every security - O(SxT). Replaced with trades_by_security_id, pre-indexed once from the loaded array.Holding prices:
holdings.select { |h| h.security_id == id }- same O(SxH) pattern. Replaced with holdings_by_security_id.Prices are now indexed into prices_by_date and prices_by_date_and_source
hashes during load_prices, making get_price O(1) instead of scanning the
flat prices array on every lookup.
Holding::PortfolioCache - get_trades / get_price:
get_trades(date:):
trades.select { |t| t.date == date }(O(T) scan) replaced with trades_by_date hash (O(1)).get_price: two
prices.select { p.date == date ... }.min_bylinear scans replaced with direct hash lookups into prices_by_date and prices_by_date_and_source.Holding::PortfolioCache - collect_unique_securities:
holdings.map(&:security)traversed the security association on everyholding record (N+1 if not preloaded). Replaced with a pluck of
security_ids followed by a single Security.where(id: ...) batch load.
Holding::ForwardCalculator / ReverseCalculator:
holdings += build_holdings(...)allocated a new array copy on everyiteration - O(N) per day x thousands of days = O(D^2) total allocations.
Replaced with holdings.concat(...) which appends in place, O(1).
Holding::ReverseCalculator - precompute_cost_basis:
Old: walked every date from account.start_date to Date.current (O(D)),
writing a cost_basis entry for every security on every date. For an
account with 2 trades over 9,250 days this wrote ~18,500 hash entries
and consumed the full date range in the outer loop regardless of trade
density.
New: walks only buy trades (O(T)), appending one [date, avg_cost]
snapshot per trade. cost_basis_for binary-searches the sparse snapshot
array - O(log T) per lookup. Memory drops from O(DxS) to O(T).
Holding::Gapfillable:
security_holdings.find { |h| h.date == date }was called on everydate in the gapfill range - O(H) per date, O(HxD) total. Replaced with
security_holdings.index_by(:date) built once before the loop, making
each date lookup O(1).
Holding::Materializer - purge_stale_holdings:
account.entries.trades.map { |entry| entry.entryable.security_id }.uniqloaded all trade entry records into Ruby then traversed the entryable
association on each (N+1). Replaced with account.trades.pluck(:security_id).uniq
(single SQL query returning only the IDs).
In testing, these changes were able to reduce sync time of an account with 25 years of history and 70 securities from about 90 minutes down to under 3 minutes.
Summary by CodeRabbit
Performance Improvements
Accuracy / Bug Fixes
Tests