Skip to content

billing: split invoices_ext data view from authorization wrapper#2970

Merged
jshearer merged 1 commit into
masterfrom
jshearer/fix_billing_queries
May 28, 2026
Merged

billing: split invoices_ext data view from authorization wrapper#2970
jshearer merged 1 commit into
masterfrom
jshearer/fix_billing_queries

Conversation

@jshearer
Copy link
Copy Markdown
Contributor

@jshearer jshearer commented May 26, 2026

Summary

After merging #2883 and querying for invoices, I observed behavior that didn't show up locally: anything querying tenant.billing.invoices would hang and eventually fail with a database timeout. Ultimately, this is because we're querying public.invoices_ext which is (somewhat poorly) designed to be queried by PostgREST, so it contains an authorized_tenants CTE that resolves the caller's admin-able tenants, joined into all three UNION ALL branches. This has some problems:

  1. The CTE blocks predicate pushdown. The resolver issues WHERE billed_prefix = $1, but billed_prefix in the view is authorized_tenants.tenant, computed inside the CTE. The outer predicate can only filter the CTE's output, not the source tables, so billing_historicals is scanned with no tenant constraint and nested-loop joined against the CTE's row set.

  2. The preview branch is per-tenant-per-month. For each row of authorized_tenants it evaluates generate_series(tenant.created_at, now(), '1 month') and calls internal.billing_report_202308(tenant, month) for every month not yet frozen. That function reads catalog_stats with window functions per call.

  3. The resolver's predicate ($6::text IS NULL OR invoice_type::text = $6) cannot trigger UNION ALL branch elimination under a generic plan, because the filter is not a constant the planner can evaluate against each branch's literal invoice_type. So the preview branch survives into the plan on every call once the prepared statement flips to generic.

To resolve the problem, this change extracts the data logic into a new internal.invoices_ext view that directly queries internal.billing_historicals, public.tenants, and internal.manual_bills. public.invoices_ext is rewritten as a simple authz wrapper around internal.invoices_ext using its existing authorized_tenants CTE, preserving the public view's authorization behavior and column shape. The GraphQL resolver, which already calls verify_authorization before issuing SQL, queries internal.invoices_ext directly.

Note: I would like to take a deeper look at the whole invoice system in the future. As it stands, there is a lot of disjoint business logic and data models that I would like to bring together somehow: monthly usage-based invoices are stored in one place, generated by an enormously complex SQL function. Manually entered bills exist somewhere else, and then this view tries to paper over all of that and present a consistent surface.

I'd like to migrate it all to Rust eventually, but I felt that this PR was a justified middle ground to getting the new APIs working before we do.

Diff between old `public.invoices_ext` and new `internal.invoices_ext` showing that the net is only removing the authorization logic:
--- OLD public.invoices_ext
+++ NEW internal.invoices_ext
@@ -1,62 +1,43 @@
-create view public.invoices_ext as 
-  with  has_bypassrls as (
-  select  (exists (
-  select  1 
-  from  pg_roles 
-  where  ((pg_roles.rolname = current_role)
-  and (pg_roles.rolbypassrls = true))))
-  as bypass ), authorized_tenants as (
-  select  tenants.tenant, tenants.created_at 
-  from  ((public.tenants left 
-  join  has_bypassrls
-  on (true)) left 
-  join  public.auth_roles('admin'::public.grant_capability) auth_roles(role_prefix, capability)
-  on (((tenants.tenant)::text ^@ (auth_roles.role_prefix)::text))) 
-  where  (has_bypassrls.bypass
-  or (auth_roles.role_prefix is not null))) 
+create view internal.invoices_ext as 
   select  (date_trunc('month'::text, ((h.report ->> 'billed_month'::text))::timestamp 
   with  time zone))::date
   as date_start , (((date_trunc('month'::text, ((h.report ->> 'billed_month'::text))::timestamp 
   with  time zone) + '1 mon'::interval) - '1 day'::interval))::date
-  as date_end , (authorized_tenants.tenant)::text
+  as date_end , (h.tenant)::text
   as billed_prefix , coalesce(nullif((h.report -> 'line_items'::text), 'null'::jsonb), '[]'::jsonb)
   as line_items , (coalesce(nullif((h.report -> 'subtotal'::text), 'null'::jsonb), to_jsonb(0)))::integer
   as subtotal , h.report
   as extra , 'final'::text
   as invoice_type  
-  from  (internal.billing_historicals h 
-  join  authorized_tenants
-  on (((h.tenant)::text ^@ (authorized_tenants.tenant)::text))) 
+  from  internal.billing_historicals h 
   union all  
   select  (date_trunc('month'::text, ((report.report ->> 'billed_month'::text))::timestamp 
   with  time zone))::date
   as date_start , (((date_trunc('month'::text, ((report.report ->> 'billed_month'::text))::timestamp 
   with  time zone) + '1 mon'::interval) - '1 day'::interval))::date
-  as date_end , (authorized_tenants.tenant)::text
+  as date_end , (t.tenant)::text
   as billed_prefix , coalesce(nullif((report.report -> 'line_items'::text), 'null'::jsonb), '[]'::jsonb)
   as line_items , (coalesce(nullif((report.report -> 'subtotal'::text), 'null'::jsonb), to_jsonb(0)))::integer
   as subtotal , report.report
   as extra , 'preview'::text
   as invoice_type  
-  from  ((authorized_tenants 
-  join  lateral generate_series((greatest('2023-08-01'::date, (date_trunc('month'::text, authorized_tenants.created_at))::date))::timestamp 
+  from  public.tenants t 
+  join  lateral generate_series((greatest('2023-08-01'::date, (date_trunc('month'::text, t.created_at))::date))::timestamp 
   with  time zone, date_trunc('month'::text, ((now())::date)::timestamp 
   with  time zone), '1 mon'::interval) invoice_month(invoice_month)
-  on ((not (exists (
+  on not exists (
   select  1 
   from  internal.billing_historicals h 
-  where  (((h.tenant)::text ^@ (authorized_tenants.tenant)::text)
+  where  ((h.tenant)::text ^@ (t.tenant)::text)
   and ((date_trunc('month'::text, ((h.report ->> 'billed_month'::text))::timestamp 
-  with  time zone))::date = invoice_month.invoice_month))))))) 
-  join  lateral internal.billing_report_202308((authorized_tenants.tenant)::public.catalog_prefix, invoice_month.invoice_month) report(report)
-  on (true)) 
+  with  time zone))::date = invoice_month.invoice_month)) 
+  join  lateral internal.billing_report_202308((t.tenant)::public.catalog_prefix, invoice_month.invoice_month) report(report)
+  on true 
   union all  
-  select  manual_bills.date_start, manual_bills.date_end, (authorized_tenants.tenant)::text
+  select  manual_bills.date_start, manual_bills.date_end, (manual_bills.tenant)::text
   as billed_prefix , jsonb_build_array(jsonb_build_object('description', manual_bills.description, 'count', 1, 'rate', manual_bills.usd_cents, 'subtotal', manual_bills.usd_cents))
   as line_items , manual_bills.usd_cents
   as subtotal , 'null'::jsonb
   as extra , 'manual'::text
   as invoice_type  
-  from  (internal.manual_bills 
-  join  authorized_tenants
-  on (((manual_bills.tenant)::text ^@ (authorized_tenants.tenant)::text)))
+  from  internal.manual_bills
Following is the meat of a Claude-assisted analysis of the root cause:

Smoking gun: old plan

EXPLAIN of the resolver's exact query with JWT session context set:

Limit  (cost=26360849.10..26360849.11 rows=2 width=330)
  ->  Sort  (cost=26360849.10..26363898.24 rows=1219654 width=330)
        ->  Append  (cost=430705.24..26336456.02 rows=1219654 width=330)
              CTE authorized_tenants
                ->  Nested Loop Left Join  (cost=2.61..430705.24 rows=95326)
                      Join Filter: ((tenants.tenant)::text ^@ (auth_roles.role_prefix)::text)
                      ->  Seq Scan on public.tenants                  (rows=19113)
                      ->  Function Scan on public.auth_roles          (rows=1000)
              ->  Nested Loop  (cost=0.00..3400808.72 rows=743672)         [final branch]
                    Join Filter: ((h.tenant)::text ^@ (authorized_tenants.tenant)::text)
                    ->  Seq Scan on internal.billing_historicals      (rows=311812)
                    ->  CTE Scan on authorized_tenants                (filtered to 'estuary/')
              ->  Nested Loop  (cost=133774.18..22483977.33 rows=475808)   [preview branch]
                    ->  Merge Anti Join  (rows=475808)
                          ->  ... authorized_tenants x generate_series ...
                          ->  ... billing_historicals materialize ...
                    ->  Function Scan on internal.billing_report_202308
              ->  Nested Loop  (cost=0.00..2669.93 rows=174)               [manual branch]

Total estimated cost 26.4M, dominated by the preview branch's Function Scan on internal.billing_report_202308.

Smoking gun: new plan

Same resolver query against the new internal.invoices_ext, inlined into the test query because the migration is not yet deployed. Two plan regimes verified.

Constant-folded (literal SQL, planner can prune)

Limit  (cost=27.28..27.29 rows=2 width=452) (actual time=1.962..1.963 rows=2 loops=1)
  Buffers: shared hit=37 read=2
  ->  Sort  (Sort Method: top-N heapsort  Memory: 33kB)
        ->  Index Scan using billing_historicals_tenant_billed_month_key on billing_historicals h
              Index Cond: (h.tenant)::text = 'estuary/'::text
              (actual rows=36)
Execution Time: 1.997 ms

Preview and manual branches pruned at plan time. The historicals scan returns the 36 rows of estuary/'s frozen invoices, top-N heapsorted to 2.

Planner-opaque (simulates the prod generic-plan regime)

The resolver's prepared statement under a generic plan hides the parameter values from the planner. We simulate this by routing the tenant and type filters through current_setting(), which is STABLE but not evaluable at plan time, so the planner cannot prune branches:

WHERE billed_prefix = current_setting('myapp.tenant', true)
  AND (current_setting('myapp.invoice_type', true) IS NULL
       OR invoice_type = current_setting('myapp.invoice_type', true))

Plan:

Limit  (actual time=0.677..0.680 rows=2 loops=1)
  Buffers: shared hit=39
  ->  Sort (top-N heapsort  Memory: 33kB)
        ->  Append (actual rows=36 loops=1)
              ->  Result  [final]    One-Time Filter: (current_setting('myapp.invoice_type', true) IS NULL
                                                       OR 'final' = current_setting('myapp.invoice_type', true))
                    ->  Index Scan billing_historicals  (actual rows=36)
                          Index Cond: (tenant)::text = current_setting('myapp.tenant', true)
              ->  Result  [preview]  One-Time Filter: <same shape, 'preview' literal>
                    ->  Nested Loop  (never executed)
                          ->  Index Scan tenants_tenant_key on tenants  (never executed)
                          ->  generate_series  (never executed)
                          ->  Function Scan on internal.billing_report_202308  (never executed)
              ->  Result  [manual]   One-Time Filter: <same shape, 'manual' literal>
                    ->  Seq Scan manual_bills  (never executed)
Execution Time: 0.760 ms

This is the actual prod regime. All three branches survive into the plan tree because the planner cannot decide the type filter at plan time. But each UNION ALL branch emits a literal invoice_type ('final', 'preview', 'manual'), so the predicate <literal> = current_setting(...) collapses to a One-Time Filter evaluated once at execution start per branch. The two inactive branches' subtrees are marked (never executed). billing_report_202308 is not invoked.

@jshearer jshearer force-pushed the jshearer/fix_billing_queries branch 2 times, most recently from 01f543d to 18424e0 Compare May 26, 2026 15:51
@jshearer jshearer requested a review from a team May 26, 2026 15:55
@jshearer jshearer self-assigned this May 26, 2026
`public.invoices_ext`'s `authorized_tenants` CTE materializes the full `tenants` table for any role with `rolbypassrls`, and its LATERAL preview branch runs `generate_series` and `billing_report_202308` across every materialized tenant. The GraphQL resolver's parameterized `WHERE billed_prefix = $1` did not push through the CTE, so `tenant.invoices` hangs. Extract the data logic into `internal.invoices_ext` (no auth) and query it directly from the resolver, which already runs `verify_authorization`:

* `billed_prefix = $1` pushes through `UNION ALL` into each branch's tenant predicate; preview's `tenants` scan becomes a single-row index lookup.
* `invoiceType` filters cause the planner to prune the preview and manual branches entirely.

`public.invoices_ext` is rewritten as a thin join over the internal view; `billing-integrations/publish.rs` and the pgTAP suite behave identically.
@jshearer jshearer force-pushed the jshearer/fix_billing_queries branch from 18424e0 to 41ecb1d Compare May 27, 2026 00:22
Copy link
Copy Markdown
Contributor

@GregorShear GregorShear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couldn't test the gql billing operations without setting up stripe integration on my local. instead I queried the views directly, and tested that they both return the same results, billed_prefix filter works, and changing my postgres user's role correctly denies access to the internal view.

@jshearer jshearer merged commit 3100a13 into master May 28, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants