Conversation
Will be used in an upcoming commit. [I recently patched this to add TypeScript type definitions][0], and have manually audited this whole dependency (it's very small). [0]: mk-pmb/getown-js@2c3e325
Will be used in an upcoming commit.
Will be used in an upcoming commit.
Will be used in an upcoming commit.
Nothing reads this yet. Wait for an upcoming commit.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (1)
✅ Files skipped from review due to trivial changes (1)
WalkthroughThis PR replaces outbox-based welcome email scheduling with a polling-based automation system. It adds a WelcomeEmailAutomationsService (with init(domainEvents)), a StartAutomationsPollEvent, a oneAtATime helper, and a poll worker that processes WelcomeEmailAutomationRun records. Member repository and Members API wiring now create WelcomeEmailAutomationRun entries instead of Outbox events. Boot now initializes the new service (while leaving outbox initialization present with a deprecation TODO). Tests were added/updated to exercise the new automation runs and poll behavior. Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
This is the bulk of the change, but aims to follow the agreed-upon spec closely.
Nothing subscribes to this yet. Wait for an upcoming commit.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #27364 +/- ##
==========================================
- Coverage 73.45% 73.45% -0.01%
==========================================
Files 1546 1550 +4
Lines 123752 124141 +389
Branches 14968 15011 +43
==========================================
+ Hits 90908 91189 +281
- Misses 31823 31930 +107
- Partials 1021 1022 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (2)
ghost/core/core/server/services/welcome-email-automations/events/start-automations-poll-event.js (1)
3-15: Consider removingtimestampuntil it is consumed.In the current flow, poll scheduling uses DB
ready_atand callback timing; this field appears unused, so dropping it would simplify the event contract.✂️ Optional simplification
module.exports = class StartAutomationsPollEvent { - /** - * `@param` {Date} timestamp - */ - constructor(timestamp) { + constructor() { this.data = null; - this.timestamp = timestamp; } @@ static create() { - return new StartAutomationsPollEvent(new Date()); + return new StartAutomationsPollEvent(); } };🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ghost/core/core/server/services/welcome-email-automations/events/start-automations-poll-event.js` around lines 3 - 15, The StartAutomationsPollEvent constructor currently stores an unused timestamp field; remove the timestamp parameter and the this.timestamp assignment from the constructor (and update its JSDoc), and update the static create() factory to call new StartAutomationsPollEvent() with no arguments; also search for any external usages of the timestamp property or constructor signature (StartAutomationsPollEvent, constructor, create) and update callers if necessary so the event contract no longer exposes timestamp until it is actually consumed.ghost/core/core/shared/one-at-a-time.js (1)
24-28: Add observability for swallowed failures.Catching and fully ignoring errors here can hide persistent poll failures. Keep the control-flow behavior, but report the error.
🔧 Suggested minimal refactor
-const oneAtATime = (fn) => { +const oneAtATime = (fn, {onError = () => {}} = {}) => { @@ - } catch { - // noop + } catch (err) { + onError(err); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ghost/core/core/shared/one-at-a-time.js` around lines 24 - 28, The catch block currently swallows errors from awaiting fn(); change the anonymous catch to catch (err) and report the error before continuing control flow: capture the thrown error (err) and call the project's logging/observability API (e.g., call a logger.error or error-reporter.captureException) or at minimum console.error(err) so failures are visible while preserving the no-op behavior otherwise; update the try { await fn(); } catch (...) block to use catch (err) { /* report err */ }.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@ghost/core/core/server/services/members/members-api/repositories/member-repository.js`:
- Line 408: The call to
dispatchEvent(this.dispatchEvent(StartAutomationsPollEvent.create(),
memberAddOptions)) passes an event whose data is null, which causes
dispatchEvent()'s rejection path to throw when it attempts to read
event.data.memberId; fix by ensuring the StartAutomationsPollEvent instance
includes a non-null data.memberId (e.g. call
StartAutomationsPollEvent.create({memberId: member.id}) or otherwise populate
data before passing to dispatchEvent), or alternatively wrap the call-site to
pass a safe data object into dispatchEvent; target the
StartAutomationsPollEvent.create() invocation and the dispatchEvent(...) call so
the event always has data.memberId when dispatched.
In `@ghost/core/core/server/services/welcome-email-automations/index.js`:
- Around line 24-37: The current enqueueAnotherPollAt implementation uses an
in-memory setTimeout inside domainEvents.subscribe(StartAutomationsPollEvent,
...) so future polls are lost on restart; change enqueueAnotherPollAt to persist
or re-arm scheduled wake-ups instead of using setTimeout by delegating to the
app Scheduler (or writing the ready_at to the DB and registering it with the
Scheduler on boot) so that poll({ memberWelcomeEmailService, ... }) will be
invoked at the saved ready_at even after a restart; update the subscription/boot
path to restore any persisted future polls and ensure enqueuePollNow remains
available for immediate wake-ups.
In `@ghost/core/core/server/services/welcome-email-automations/poll.js`:
- Around line 181-220: The code currently calls
memberWelcomeEmailService.api.send before persisting AutomatedEmailRecipient and
calling markExited, so if the DB work fails the catch block can schedule a retry
and duplicate the email; fix by making the send idempotent or moving it after
the DB transaction commits: either (A) insert AutomatedEmailRecipient and call
markExited within db.knex.transaction first, commit, then call
memberWelcomeEmailService.api.send using the saved member_uuid/member_email so
send only occurs after successful DB work, or (B) add an existence check of
AutomatedEmailRecipient (e.g., before send look up AutomatedEmailRecipient by
member_uuid and automated_email_id) and skip sending if the recipient row
already exists; update references to memberWelcomeEmailService.api.send,
AutomatedEmailRecipient.add, markExited, markRetry, markMaxAttemptsExceeded and
enqueueAnotherPollAt accordingly.
- Around line 52-80: The current logic selects eligible runs then updates them
later, allowing races; change the selection to claim rows atomically using
row-level locking within the same transaction by applying FOR UPDATE SKIP LOCKED
(Knex .forUpdate().skipLocked()) on the query against
welcome_email_automation_runs (the query that populates runs) so only unlocked
rows are returned, then perform the update (setting step_started_at/updated_at
and incrementing step_attempts) on those locked rows within the same trx; ensure
you still return the selected rows (populate runs) for processing and remove the
separate ids push/select-then-update race.
- Around line 175-180: Before calling api.send, add explicit bailout checks
after Member.findOne by: 1) verify the member exists for run.member_id and, if
not, mark the run as a terminal/stale state (so it won't be retried) and return;
2) check the member's subscription/unsubscribe flags and, if unsubscribed, mark
the run terminal and return; 3) check the member's current status (free/paid or
other relevant fields) against whatever state the run expects, and if it has
changed, mark the run terminal and return; implement these checks in poll.js
around the Member.findOne result and ensure you update the run record (set a
terminal status like "exited"/"stale" on the run) before returning so api.send
is never called for stale/deleted/unsubscribed/changed members.
---
Nitpick comments:
In
`@ghost/core/core/server/services/welcome-email-automations/events/start-automations-poll-event.js`:
- Around line 3-15: The StartAutomationsPollEvent constructor currently stores
an unused timestamp field; remove the timestamp parameter and the this.timestamp
assignment from the constructor (and update its JSDoc), and update the static
create() factory to call new StartAutomationsPollEvent() with no arguments; also
search for any external usages of the timestamp property or constructor
signature (StartAutomationsPollEvent, constructor, create) and update callers if
necessary so the event contract no longer exposes timestamp until it is actually
consumed.
In `@ghost/core/core/shared/one-at-a-time.js`:
- Around line 24-28: The catch block currently swallows errors from awaiting
fn(); change the anonymous catch to catch (err) and report the error before
continuing control flow: capture the thrown error (err) and call the project's
logging/observability API (e.g., call a logger.error or
error-reporter.captureException) or at minimum console.error(err) so failures
are visible while preserving the no-op behavior otherwise; update the try {
await fn(); } catch (...) block to use catch (err) { /* report err */ }.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 38827c5b-b619-45a1-a2cb-623d81c8e6c3
⛔ Files ignored due to path filters (1)
pnpm-lock.yamlis excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (14)
ghost/core/core/boot.jsghost/core/core/server/services/members/api.jsghost/core/core/server/services/members/members-api/members-api.jsghost/core/core/server/services/members/members-api/repositories/member-repository.jsghost/core/core/server/services/welcome-email-automations/events/start-automations-poll-event.jsghost/core/core/server/services/welcome-email-automations/index.jsghost/core/core/server/services/welcome-email-automations/poll.jsghost/core/core/shared/one-at-a-time.jsghost/core/package.jsonghost/core/test/integration/services/member-welcome-emails.test.jsghost/core/test/integration/services/welcome-email-automations/poll.test.jsghost/core/test/unit/server/services/members/members-api/repositories/member-repository.test.jsghost/core/test/unit/server/services/welcome-email-automations/index.test.jsghost/core/test/unit/shared/one-at-a-time.test.js
| } | ||
|
|
||
| this.dispatchEvent(StartOutboxProcessingEvent.create({memberId: member.id}), memberAddOptions); | ||
| this.dispatchEvent(StartAutomationsPollEvent.create(), memberAddOptions); |
There was a problem hiding this comment.
StartAutomationsPollEvent can break dispatchEvent()'s rollback handler.
These new calls pass an event whose data is null, but dispatchEvent()'s rejection path logs event.data.memberId. If the surrounding transaction rolls back, that log statement throws while handling the rollback, so this "best effort" path can become an unhandled rejection instead.
Also applies to: 1507-1507
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@ghost/core/core/server/services/members/members-api/repositories/member-repository.js`
at line 408, The call to
dispatchEvent(this.dispatchEvent(StartAutomationsPollEvent.create(),
memberAddOptions)) passes an event whose data is null, which causes
dispatchEvent()'s rejection path to throw when it attempts to read
event.data.memberId; fix by ensuring the StartAutomationsPollEvent instance
includes a non-null data.memberId (e.g. call
StartAutomationsPollEvent.create({memberId: member.id}) or otherwise populate
data before passing to dispatchEvent), or alternatively wrap the call-site to
pass a safe data object into dispatchEvent; target the
StartAutomationsPollEvent.create() invocation and the dispatchEvent(...) call so
the event always has data.memberId when dispatched.
| domainEvents.subscribe(StartAutomationsPollEvent, oneAtATime(async () => { | ||
| await poll({ | ||
| memberWelcomeEmailService, | ||
| enqueueAnotherPollNow: enqueuePollNow, | ||
| enqueueAnotherPollAt: (date) => { | ||
| // TODO(NY-1191): Use Scheduler instead of `setTimeout`. | ||
| setTimeout(() => { | ||
| enqueuePollNow(); | ||
| }, date.getTime() - Date.now()); | ||
| } | ||
| }); | ||
| })); | ||
|
|
||
| enqueuePollNow(); |
There was a problem hiding this comment.
Persist or re-arm future polls across restarts.
enqueueAnotherPollAt() only creates an in-memory setTimeout. If Ghost restarts after a run is saved with a future ready_at, that wake-up is lost. On boot this service only fires an immediate poll, and poll() only claims ready_at <= now, so those future retries can sit forever until some unrelated event happens to dispatch another poll.
🧰 Tools
🪛 GitHub Check: SonarCloud Code Analysis
[warning] 29-29: Complete the task associated to this "TODO" comment.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@ghost/core/core/server/services/welcome-email-automations/index.js` around
lines 24 - 37, The current enqueueAnotherPollAt implementation uses an in-memory
setTimeout inside domainEvents.subscribe(StartAutomationsPollEvent, ...) so
future polls are lost on restart; change enqueueAnotherPollAt to persist or
re-arm scheduled wake-ups instead of using setTimeout by delegating to the app
Scheduler (or writing the ready_at to the DB and registering it with the
Scheduler on boot) so that poll({ memberWelcomeEmailService, ... }) will be
invoked at the saved ready_at even after a restart; update the subscription/boot
path to restore any persisted future polls and ensure enqueuePollNow remains
available for immediate wake-ups.
| return db.knex.transaction(async (trx) => { | ||
| /** @type {Run[]} */ | ||
| const runs = await trx('welcome_email_automation_runs as r') | ||
| .join('welcome_email_automations as a', 'r.welcome_email_automation_id', 'a.id') | ||
| .join('welcome_email_automated_emails as e', 'r.next_welcome_email_automated_email_id', 'e.id') | ||
| .whereNotNull('r.next_welcome_email_automated_email_id') | ||
| .where('r.ready_at', '<=', now) | ||
| .where(function () { | ||
| this.whereNull('r.step_started_at').orWhere('r.step_started_at', '<', lockCutoff); | ||
| }) | ||
| .select('r.id', 'r.member_id', 'r.step_attempts', 'r.next_welcome_email_automated_email_id', 'a.slug as automation_slug', 'e.id as automated_email_id') | ||
| .limit(MAX_RUNS_PER_BATCH); | ||
|
|
||
| if (runs.length === 0) { | ||
| return runs; | ||
| } | ||
|
|
||
| /** @type {string[]} */ const ids = []; | ||
|
|
||
| for (const run of runs) { | ||
| ids.push(run.id); | ||
| run.step_attempts += 1; | ||
| } | ||
|
|
||
| await trx('welcome_email_automation_runs').whereIn('id', ids).update({ | ||
| step_started_at: now, | ||
| step_attempts: db.knex.raw('step_attempts + 1'), | ||
| updated_at: now | ||
| }); |
There was a problem hiding this comment.
Claim runs atomically before processing them.
fetchAndLockRuns() reads eligible rows first and only updates them afterward. Two Ghost instances can select the same run before either update commits, so both workers can send the same welcome email. This needs a real claim step with row locking or another atomic ownership mechanism.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@ghost/core/core/server/services/welcome-email-automations/poll.js` around
lines 52 - 80, The current logic selects eligible runs then updates them later,
allowing races; change the selection to claim rows atomically using row-level
locking within the same transaction by applying FOR UPDATE SKIP LOCKED (Knex
.forUpdate().skipLocked()) on the query against welcome_email_automation_runs
(the query that populates runs) so only unlocked rows are returned, then perform
the update (setting step_started_at/updated_at and incrementing step_attempts)
on those locked rows within the same trx; ensure you still return the selected
rows (populate runs) for processing and remove the separate ids
push/select-then-update race.
| const member = await Member.findOne({id: run.member_id}, {withRelated: ['newsletters']}); | ||
|
|
||
| // TODO(NY-1192): Bail if member no longer exists | ||
| // TODO(NY-1193): Bail if member is unsubscribed | ||
| // TODO(NY-1194): Bail if member's status has changed | ||
|
|
There was a problem hiding this comment.
Exit stale runs before calling api.send().
The TODOs here are currently user-visible behavior gaps: a deleted member turns into 10 pointless retries, and a retry can still send after unsubscribe or after the member changes free/paid state. These should be explicit exit paths, not generic send failures. If helpful, I can sketch the bailout flow.
🧰 Tools
🪛 GitHub Check: SonarCloud Code Analysis
[warning] 178-178: Complete the task associated to this "TODO" comment.
[warning] 179-179: Complete the task associated to this "TODO" comment.
[warning] 177-177: Complete the task associated to this "TODO" comment.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@ghost/core/core/server/services/welcome-email-automations/poll.js` around
lines 175 - 180, Before calling api.send, add explicit bailout checks after
Member.findOne by: 1) verify the member exists for run.member_id and, if not,
mark the run as a terminal/stale state (so it won't be retried) and return; 2)
check the member's subscription/unsubscribe flags and, if unsubscribed, mark the
run terminal and return; 3) check the member's current status (free/paid or
other relevant fields) against whatever state the run expects, and if it has
changed, mark the run terminal and return; implement these checks in poll.js
around the Member.findOne result and ensure you update the run record (set a
terminal status like "exited"/"stale" on the run) before returning so api.send
is never called for stale/deleted/unsubscribed/changed members.
| await memberWelcomeEmailService.api.send({ | ||
| member: { | ||
| name: member.get('name'), | ||
| email: member.get('email'), | ||
| uuid: member.get('uuid') | ||
| }, | ||
| memberStatus | ||
| }); | ||
|
|
||
| await db.knex.transaction(async (transacting) => { | ||
| await AutomatedEmailRecipient.add({ | ||
| member_id: run.member_id, | ||
| automated_email_id: run.automated_email_id, | ||
| member_uuid: member.get('uuid'), | ||
| member_email: member.get('email'), | ||
| member_name: member.get('name') | ||
| }, {transacting}); | ||
|
|
||
| // TODO(NY-1195): Advance to next email when there are additional ones | ||
|
|
||
| await markExited(run.id, 'finished', transacting); | ||
| }); | ||
| } catch (err) { | ||
| logging.error( | ||
| { | ||
| system: { | ||
| event: 'welcome_email_automations.send_failed', | ||
| run_id: run.id | ||
| }, | ||
| err | ||
| }, | ||
| `${LOG_KEY} Failed to send welcome email for run ${run.id}` | ||
| ); | ||
|
|
||
| if (run.step_attempts < MAX_ATTEMPTS) { | ||
| const retryAt = new Date(Date.now() + RETRY_DELAY_MS); | ||
| await markRetry(run.id, retryAt); | ||
| enqueueAnotherPollAt(retryAt); | ||
| } else { | ||
| await markMaxAttemptsExceeded(run.id); |
There was a problem hiding this comment.
A successful send can be retried and duplicated here.
The email is sent before the recipient row and markExited() transaction. If the provider accepts the message and that DB work fails afterward, the catch block schedules a retry and the member can receive the same welcome email again. This boundary needs idempotency, not a blind retry.
🧰 Tools
🪛 GitHub Check: SonarCloud Code Analysis
[warning] 199-199: Complete the task associated to this "TODO" comment.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@ghost/core/core/server/services/welcome-email-automations/poll.js` around
lines 181 - 220, The code currently calls memberWelcomeEmailService.api.send
before persisting AutomatedEmailRecipient and calling markExited, so if the DB
work fails the catch block can schedule a retry and duplicate the email; fix by
making the send idempotent or moving it after the DB transaction commits: either
(A) insert AutomatedEmailRecipient and call markExited within
db.knex.transaction first, commit, then call memberWelcomeEmailService.api.send
using the saved member_uuid/member_email so send only occurs after successful DB
work, or (B) add an existence check of AutomatedEmailRecipient (e.g., before
send look up AutomatedEmailRecipient by member_uuid and automated_email_id) and
skip sending if the recipient row already exists; update references to
memberWelcomeEmailService.api.send, AutomatedEmailRecipient.add, markExited,
markRetry, markMaxAttemptsExceeded and enqueueAnotherPollAt accordingly.
| const {oneAtATime} = require('../../../core/shared/one-at-a-time'); | ||
|
|
||
| /** | ||
| * Ponyfill of `Promise.withResolvers`. |
There was a problem hiding this comment.
| labels | ||
| }, {...memberAddOptions, transacting}); | ||
|
|
||
| const timestamp = eventData.created_at || newMember.get('created_at'); |
There was a problem hiding this comment.
Was it ok to remove this? Same question applies to timestamp below.
|
|
||
| return db.knex.transaction(async (trx) => { | ||
| /** @type {Run[]} */ | ||
| const runs = await trx('welcome_email_automation_runs as r') |
There was a problem hiding this comment.
I wasn't sure if this should use Bookshelf, but I didn't.
| } | ||
|
|
||
| memberWelcomeEmailService.init(); | ||
| await memberWelcomeEmailService.api.loadMemberWelcomeEmails(); |
There was a problem hiding this comment.
I'm pretty sure this needs to be done over time in case welcome emails change. Let me know if there's a way around this.
|
I feel that both the Codecov and SonarQube reports are not helpful in this case. |
|



closes https://linear.app/ghost/issue/NY-1190
I recommend reviewing this commit-by-commit.
This change should have no user impact. It replaces the current welcome email system with one powered by the new automations rails.
We've discussed the design for this change at some length. I hope this is a straightforward implementation of that design. Here's the prompt I gave to a coding agent:
📝 LLM prompt
I'm working on migrating member welcome emails to our new automations data model. See the attached design document below for background.
I've already created the relevant tables/models for this, but most things are still running with the old logic. I want to run member welcome emails on the new data model.
This change should make two significant changes:
MemberCreatedEvent. This code already exists and doesn't need to be added.MemberCreatedEvent. You should decide where this listener goes, but it should be set up on app boot (directly or indirectly); consider adding it to an existing part of the system that's already listening for domain events. Create a newWelcomeEmailAutomationRunand save it to the database.core/server/services/members/members-api/repositories/member-repository.js, remove_Outboxinteractions.welcomeEmailAutomationId: either the free or paid automation, depending on how the user signed upmemberId: the new/updated member's IDnextWelcomeEmailAutomatedEmailId: the first welcome email's ID. You can assume it's the only one where the automation ID matchesreadyAt: right now (we may change this in a followup patch)stepStartedAt:nullstepAttempts:0exitReason:nullclassservice that's initialized on startup. When an automations poll executes…WelcomeEmailAutomationRuns that we should start (seedesign_documentbelow for a sketch of a database query, which should use Knex/Bookshelf as much as possible, not raw SQL). Also join theWelcomeEmailAutomationandWelcomeEmailAutomatedEmail.stepStartedAtshould be "now"stepAttemptsshould be increased by 1WelcomeEmailAutomationRun…stepAttemptsis larger thanMAX_ATTEMPTS(which should be hard-coded to 10, but the hard-coded value could change in the future), mark the job totally failed (see below).core/server/services/outbox/handlers/member-created.js.stepAttempts, updatereadyAtto give some delay, and enqueue another automation poll. (Seedesign_documentbelow for a sketch of a database query, which should use Knex/Bookshelf as much as possible.)design_documentbelow for a sketch of a database query, which should use Knex/Bookshelf as much as possible.)This patch isn't meant to implement the full design. It's a step in the right direction. This patch should NOT:
This change should have no user impact. It should only change the plumbing of how member welcome emails are sent.
This implemented a great first draft. I made many changes to its output.