Skip to content

Non-blocking Interface for Payjoin State Machine #1446

Open
xstoicunicornx wants to merge 8 commits into
payjoin:masterfrom
xstoicunicornx:async-compat-api
Open

Non-blocking Interface for Payjoin State Machine #1446
xstoicunicornx wants to merge 8 commits into
payjoin:masterfrom
xstoicunicornx:async-compat-api

Conversation

@xstoicunicornx
Copy link
Copy Markdown
Collaborator

@xstoicunicornx xstoicunicornx commented Mar 26, 2026

Summary

Currently some states (UncheckedOriginalPayload, MaybeInputsOwned, MaybeInputsSeen, OutputsUnknown, ProvisionalProposal) require the usage of synchronous callbacks to transition to the next state in the payjoin state machine. This is inconvenient for languages which required the use of asynchronous calls for validation, for example when calling the bitcoin rpc, which use the payjoin FFI language bindings. While there are some workarounds that can be used to adequately validate the state machine transitions (except for UncheckedOriginalPayload) when relying on asynchronous validation, these are a bit unwieldily to use. Instead, this PR implements an asynchronous compatible interface that makes it more straightforward to validated the state machine transitions with asynchronous calls.

Background

I was attempting to prove out the Javascript language bindings by implementing a Node version of the payjoin-cli and opened issue #1389 after encountering some of the challenges with using asynchronous validation callbacks. It was revealed to me that this was a known pain point and there was a solution proposed by @arminsabouri of creating new state transition methods that accept the result of a validation call. This is my attempt at implementing this solution.

Any and all feedback is welcome! This is currently in draft status as I am still working on testing and the language bindings.

AI Disclosure

Unit tests were written with the help of Claude AI.

Pull Request Checklist

Please confirm the following before requesting review:

Copy link
Copy Markdown
Collaborator

@spacebear21 spacebear21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

concept ACK. I mostly reviewed the v1 state machine before realizing those same changes are applied in v2 as well, so all my comments apply to the v2 changes too. I'm actually not sure we need to support an async-compatible interface for v1 at all? It's going to result in a lot of duplication and afaik v1 is a) synchronous protocol and b) getting deprecated.

Comment thread payjoin/src/core/receive/v1/mod.rs Outdated
Comment thread payjoin/src/core/receive/v1/mod.rs Outdated
Comment thread payjoin/src/core/receive/v1/mod.rs Outdated
/// function sets that parameter to None so that it is ignored in subsequent steps of the
/// receiver flow. This protects the receiver from accidentally subtracting fees from their own
/// outputs.
#[cfg_attr(not(feature = "v1"), allow(dead_code))]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to this PR but this seems odd.. Why is this "not v1" feature gate here in the v1 module??

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it was saying that that if v1 is not being used allow this as dead code, which seemed logical to me. But I am not really familiar with rust conventions on how to handle features that aren't used.

Comment thread payjoin/src/core/receive/v1/mod.rs Outdated
Comment thread payjoin/src/core/receive/mod.rs Outdated
@xstoicunicornx xstoicunicornx force-pushed the async-compat-api branch 2 times, most recently from b8a35e8 to b0d967f Compare March 27, 2026 02:57
@xstoicunicornx
Copy link
Copy Markdown
Collaborator Author

I'm actually not sure we need to support an async-compatible interface for v1 at all? It's going to result in a lot of duplication and afaik v1 is a) synchronous protocol and b) getting deprecated.

Roger that, just removed v1 async interface.

@DanGould
Copy link
Copy Markdown
Contributor

@spacebear21 if our current targets will use this interface and they support v1, we need to support v1. We're have a clean cut point for v1 support but we haven't hit it yet and forcing it with a regression (which I think not supporting v1 on this new api + migrating e.g. bbmobile to this new api would cause) is gonna bring us bigger problems than some duplicate code will.

@nothingmuch
Copy link
Copy Markdown
Contributor

Concept ACK.

The new pub fns in the API should have some unit tests.

I would prefer if the existing callback based fns were refactored to use the newly introduced ones, since the latter are are lower level. That should reduce complexity and improve readability.

@xstoicunicornx
Copy link
Copy Markdown
Collaborator Author

Unit tests are definitely coming!

I would prefer if the existing callback based fns were refactored to use the newly introduced ones, since the latter are are lower level. That should reduce complexity and improve readability.

I generally agree, however the existing callback based fns are more compatible with the OriginalPayload api which is also callback based. Would you be open to adding new non-callback based fns to OriginalPayload as well? Otherwise the existing callback based fns would have to transform their arguments to using the new non-callback based fns, and the new non-callback based fns would again have to transform their arguments to using the existing OriginalPayload callback based fns. If we go down this route I would also refactor the existing OriginalPayload callback based fns to use the new OriginalPayload non-callback based fns.

@nothingmuch
Copy link
Copy Markdown
Contributor

nothingmuch commented Mar 27, 2026

I generally agree, however the existing callback based fns are more compatible with the OriginalPayload api which is also callback based. Would you be open to adding new non-callback based fns to OriginalPayload as well?

Yes. The docs have always made a point of emphasizing this library does no IO, and while technically true, it was arguably in only weaker sense than it should be, as normally it would kind of imply the library is completely agnostic to all IO considerations, but the callback API does make it harder if the callbacks need to do async IO. IMO that gap was/is a bug.

We reasoned we could close this gap without technically breaking semver compatibility as it strictly adds API surface. Since there was no concrete need at the time, that meant it didn't need to be in the 1.0 milestone.

I think there is some benefit in frontloading this refactor (i.e. not putting off to a followup PR), which is that dogfooding this the new non-blocking API will help it be narrower (by finding good answers things like the Vec<(ScriptBuf, bool) question). If this non-blocking API is nice to use, and can make the callback based API basically trivial, then it'd make sense to support both an async and a blocking callback API as that wouldn't be much of a maintenance burden. But if there are mismatches that require non-trivial code to adapt, then the same justification doesn't really make sense.

As an example of something I'm hoping such dogfooding of the API can catch, the allocation of the temporary hashmap in the non blocking ownership check may preclude its use on constrained hardware devices where allocation needs to be avoided. Finding a good type signature which is intuitive and easy to use, and which allows avoiding heap allocations would allow avoid a situation in the future where we might need two versions of this functionality, one which avoids allocations and one which is required for backwards compatibility.

@xstoicunicornx
Copy link
Copy Markdown
Collaborator Author

PR has been updated to reflect a lot (but not all) of the feedback. Still very preliminary but just wanted to get this out there to verify I'm on the right track.

Things updated:

  • v1 interface is back
  • existing callback based fns now depend on the new non-blocking fns
  • replaced comment references to "async" with "non-blocking" as this seems like a more general and all encompassing term
  • removed HashMap dependency

Things not updated:

  • the Vec<(_, bool)> result types as I wanted to think about what the best way to handle it would be (also I just saw nothingmuch's comment which helps give me some direction)
  • tests (coming!!)

Please let me know if there is anything that is glaringly wrong. I will continue to refactor and work on the items that have not yet been updated in the next few days.

fn from(e: ProtocolError) -> Self { Error::Protocol(e) }
}

impl From<ImplementationError> for Error {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this extends the public API, and since ImplementationError has some From impls (which arguably also shouldn't be there) effectively allows any &str or Box<dyn Error + Send + sync>) to be converted to an Error

@chavic IIRC you have been auditing and removing some of these to remove unintended pub functionality, thoughts?

@nothingmuch
Copy link
Copy Markdown
Contributor

Please let me know if there is anything that is glaringly wrong. I will continue to refactor and work on the items that have not yet been updated in the next few days.

the only thing that stands out at the moment is that there is a lot of new pub stuff, not all of which seems necessary to make pub (e.g. the gather utility method). relatedly, where Vec is returned it may be better to return impl IntoIterator etc, to avoid leaking implementation details into the pub fn type signatures. seems like all of that can be dealt with as you refactor so nothing glaringly wrong IMO

@coveralls
Copy link
Copy Markdown
Collaborator

coveralls commented Mar 30, 2026

Coverage Report for CI Build 26348330495

Coverage increased (+0.2%) to 85.339%

Details

  • Coverage increased (+0.2%) from the base build.
  • Patch coverage: 56 uncovered changes across 3 files (578 of 634 lines covered, 91.17%).
  • 6 coverage regressions across 2 files.

Uncovered Changes

File Changed Covered %
payjoin/src/core/receive/v2/mod.rs 232 195 84.05%
payjoin/src/core/receive/mod.rs 277 265 95.67%
payjoin-cli/src/app/v2/mod.rs 46 39 84.78%

Coverage Regressions

6 previously-covered lines in 2 files lost coverage.

File Lines Losing Coverage Coverage
payjoin/src/core/receive/v2/mod.rs 5 90.54%
payjoin-cli/src/app/v2/mod.rs 1 53.62%

Coverage Stats

Coverage Status
Relevant Lines: 14119
Covered Lines: 12049
Line Coverage: 85.34%
Coverage Strength: 383.85 hits per line

💛 - Coveralls

@xstoicunicornx xstoicunicornx changed the title Async Compatible Interface for Payjoin State Machine Non-blocking Interface for Payjoin State Machine Mar 31, 2026
Copy link
Copy Markdown
Contributor

@0xZaddyy 0xZaddyy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @xstoicunicornx solid work on the implementations, just little observations and i have suggested some changes. also i noticed some new validation methods doesn't have some error handling like finalize_signed_proposal

Comment thread payjoin/src/core/receive/mod.rs Outdated
Comment thread payjoin/src/core/receive/mod.rs Outdated
Comment thread payjoin/src/core/receive/mod.rs Outdated
@xstoicunicornx
Copy link
Copy Markdown
Collaborator Author

Thanks for taking a look @0xZaddyy ! All these errors will be going away with the updated implementation however I definitely agree with your suggestions and will keep that error format in mind moving forward.

Comment thread payjoin/src/core/receive/v1/mod.rs Outdated
Comment thread payjoin/src/core/receive/v1/mod.rs Outdated
@xstoicunicornx
Copy link
Copy Markdown
Collaborator Author

PR is getting much closer to getting out of draft state. Got comments and tests refined with this latest push.

Some items I would like feedback on:

  • Currently the get_inputs_owned_validator is returned as a Result due to the way that the input's previous output script is looked up, is there a better way to grab the script so that doesn't require returning Result?
  • The run_async method on the base Validator runs the callback sequentially rather than concurrently, is there a way to run concurrently without requiring additional dependency? Is it important for this to run concurrently?
  • Does the visibility of the validator components look alright?
  • Should get_inputs/outputs_owned/seen_validator methods be kept? They are basically just wrappers for the validator constructors.
  • The existing check_inputs_not_owned and identify_receiver_outputs take a callback that has &Script as an argument but the new run and run_async methods in the InputsOwnedValidator and OutputsOwnedValidator take a callback that has &ScriptBuf as an argument, is this inconsistency alright? I only did this because I could not store Script type within the Validator without wrapping it in a heap allocated type like Box.

Originally I had scoped the FFI bindings to be part of this PR but at this point I think it might be better to open a separate PR for that if no one has any objections.

Please let me know what questions or other feedback you all have.

Copy link
Copy Markdown
Collaborator

@benalleng benalleng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't speak for everyone but I have some thought on your questions.

Currently the get_inputs_owned_validator is returned as a Result due to the way that the input's previous output script is looked up, is there a better way to grab the script so that doesn't require returning Result?

Seeing the InputsOwnedValidator::new() I think its reasonable to use a Result.

The run_async method on the base Validator runs the callback sequentially rather than concurrently, is there a way to run concurrently without requiring additional dependency? Is it important for this to run concurrently?

I think without using something like futures::join_all the readability would just nosedive and I think it would not be worth the nicety of having it concurrent.

Does the visibility of the validator components look alright?

I think that we either have to choose these to be internal and the below methods to be the public wrappers or make all the new new()'s at least somewhat consistent as pub

Should get_inputs/outputs_owned/seen_validator methods be kept? They are basically just wrappers for the validator constructors.

I don't think I have a complete answer but I think just removing the wrappers in favor of making the new impls pub fn makes more sense to me. What does putting them in pub wrappers give us exactly?

The existing check_inputs_not_owned and identify_receiver_outputs take a callback that has &Script as an argument but the new run and run_async methods in the InputsOwnedValidator and OutputsOwnedValidator take a callback that has &ScriptBuf as an argument, is this inconsistency alright? I only did this because I could not store Script type within the Validator without wrapping it in a heap allocated type like Box.

Is there a real benefit to keeping the &Script consistent vs accepting &ScriptBuf if we would still need it to be different by being in a Box<>

Comment thread payjoin/src/core/receive/mod.rs Outdated
@xstoicunicornx
Copy link
Copy Markdown
Collaborator Author

I have reimplemented this with async callbacks instead. See PR #1546 .

As I was working through implementing the bindings of this PR @spacebear21 prompted me to rethink this approach as the added complexity of introducing the Validator entities didn't seem worthwhile and I agreed.

Additionally, the only reason I had pursued this implementation pattern of extracting the item to be validated and then returning the validation result to advance the state machine was because I had seen the psbt_to_sign method and assumed this was the desired pattern for the ProvisionalProposal and would be useful for the other states as well. However after reviewing the Liana receiver implementation PR, which I believe psbt_to_sign was created for, I understood that this is not a pattern that is applicable to the other typestates.

@xstoicunicornx xstoicunicornx force-pushed the async-compat-api branch 2 times, most recently from c572359 to af81bd7 Compare May 19, 2026 14:45
@xstoicunicornx xstoicunicornx force-pushed the async-compat-api branch 3 times, most recently from e60abbf to 0dde47a Compare May 23, 2026 19:12
@xstoicunicornx xstoicunicornx marked this pull request as ready for review May 23, 2026 19:40
Cover the success and error paths for each validation
step in the receiver's OriginalPayload's broadcast
suitability, input ownership, and input known validation 
as well as PsbtContext's proposal finalization.
Restructure match arms in the v2 receiver typestates to
return directly from each branch instead of binding an
intermediate value and returning after the match. Also
rename a shadowed `inner` binding to `payjoin_psbt` for
clarity.
Consolidate standalone receiver processing functions into
a ReceiverProcessor class that encapsulates the payjoin
module, RPC client, and persister. Fix the PJ helper type
to use prototype inference, update the web import paths
from src to dist, and correct the CheckInputsNotSeenCallback
parameter type.
Introduce an implementation-agnostic interface for receiver
typestates that currently require callback-based validation to
advance. Previously, each validation step demanded a synchronous
closure, coupling the state machine to the caller's execution
model. This made integration difficult for wallets where signing,
broadcast checks, or ownership lookups are asynchronous or
handled by a separate process.

Each callback-based transition is now split into a two-phase
pattern: a method to extract the data that needs checking
(get_*_refs, extract_tx_*, psbt_to_sign) and a corresponding
method to submit results and advance the state (apply_*_checks,
apply_broadcast_suitability, finalize_signed_proposal). A
lightweight Reference/TaggedReference framework with typed tags
ensures completeness and ordering of the submitted checks at
runtime.

This applies across v1 and v2 receiver flows, including input
ownership, input-seen, output ownership, broadcast suitability,
proposal finalization, and transaction monitoring.

The original closure-based methods are preserved as convenience
wrappers over the new API, so this is backward-compatible for
existing integrators.
Expose the two-phase validation API from the previous commit
through the FFI bindings layer.

Update integration tests in C#, Dart, JavaScript, and Python
to exercise both callback and nonblocking transition modes.
Migrate both v1 and v2 receiver flows in payjoin-cli from
the callback-based validation API to the two-phase
extract/apply non-blocking API.
@evanlinjin
Copy link
Copy Markdown

evanlinjin commented May 27, 2026

I think these commits should be moved out into separate PRs as they do not align with the main goal of this PR and adds noise for the reviewer:

  • 20ddc7b - Introduces some quality tests (from my judgement). However, it does seem irrelevant to the goal here.
  • 446dc9d - Irrelevant refactoring.
  • 70d24b9 - Irrelevant refactoring.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants