Skip to content

Add CHEDDAR GPU backend/emitter#2896

Open
AlexanderViand wants to merge 1 commit intogoogle:mainfrom
AlexanderViand:cheddar-backend
Open

Add CHEDDAR GPU backend/emitter#2896
AlexanderViand wants to merge 1 commit intogoogle:mainfrom
AlexanderViand:cheddar-backend

Conversation

@AlexanderViand
Copy link
Copy Markdown
Collaborator

@AlexanderViand AlexanderViand commented Apr 23, 2026

Part of the work to make Cheddar a first-class backend, extracted to its own PR because the diff was getting out of hand :)

This adds the backend itself, i.e., the cheddar dialect, --scheme-to-cheddar passes, emitter, etc and a few basic tests (including some very simple "e2e" ones). It also adds Cheddar itself as an optional (default) off Bazel dependency.

Question: Do we want to run the e2e GPU tests on CI? If yes, what runner/etc should we be using?

This PR does not yet enable HEIR to target more complex programs to Cheddar, because Cheddar has very strict ideas about the acceptable scales of things, and with just this PR, HEIR's mgmt layer still has no idea about the Rational-Rescale-ish "25-30" system Cheddar uses.

@j2kun j2kun self-requested a review May 4, 2026 16:19
Copy link
Copy Markdown
Collaborator

@j2kun j2kun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will review the IR parts after office hours, but one major request below

Comment thread MODULE.bazel
)

# CHEDDAR GPU FHE library (opt-in, requires CUDA)
cheddar_extensions = use_extension("//bazel:extensions.bzl", "cheddar_deps")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So there's a lot of stuff in this PR around adding the cheddar backend (bazel overlay, opt-in config, etc.) but as far as I can tell nothing tests it because there are no end-to-end tests in this PR.

Can we either include an end-to-end test in this PR, or else have this PR only include the MLIR-only parts? (dialect, transforms, lit tests, emitter).

In particular, when I pull this branch to my local machine and I try running bazel build @cheddar, I get

$ bazel build @cheddar  
ERROR: /usr/local/google/home/jkun/.cache/bazel/_bazel_jkun/886f3804b421b5442165b2f8eb57dad5/external/rules_cuda++toolchain+cuda/BUILD: no such target '@@rules_cuda++toolchain+cuda//:thrust': target 'thrust' not declared in package '' defined by /usr/local/google/home/jkun/.cache/bazel/_bazel_jkun/886f3804b421b5442165b2f8eb57dad5/external/rules_cuda++toolchain+cuda/BUILD
ERROR: /usr/local/google/home/jkun/.cache/bazel/_bazel_jkun/886f3804b421b5442165b2f8eb57dad5/external/+cheddar_deps+cheddar/BUILD.bazel:50:11: no such target '@@rules_cuda++toolchain+cuda//:thrust': target 'thrust' not declared in package '' defined by /usr/local/google/home/jkun/.cache/bazel/_bazel_jkun/886f3804b421b5442165b2f8eb57dad5/external/rules_cuda++toolchain+cuda/BUILD and referenced by '@@+cheddar_deps+cheddar//:cheddar'
ERROR: Analysis of target '@@+cheddar_deps+cheddar//:cheddar' failed; build aborted: Analysis failed
INFO: Elapsed time: 1.340s, Critical Path: 0.02s
INFO: 1 process: 1 internal.
ERROR: Build did NOT complete successfully

@j2kun
Copy link
Copy Markdown
Collaborator

j2kun commented May 5, 2026

Question: Do we want to run the e2e GPU tests on CI? If yes, what runner/etc should we be using?

A quick look into this and it appears the Google org does not have any runners configured with a GPU. So this may be an obstacle and might force us to set up a Google-internal CI that runs on GCP and mirrors its results to GitHub.

Comment on lines +11 to +14
let summary = "Attribute for CHEDDAR";
let description = [{
This attribute represents configuration values for CHEDDAR.
}];
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let summary = "Attribute for CHEDDAR";
let description = [{
This attribute represents configuration values for CHEDDAR.
}];

nit: every def will override these fields.

}];
let parameters = (ins
StringRefParameter<"path to JSON parameter file">:$path,
DefaultValuedParameter<"bool", "false", "use 64-bit word type">:$use64Bit
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This field appears to be unused. (In fact, this entire attribute appears to be unused)

CHEDDAR is a CKKS-only GPU-accelerated FHE library. It supports both 32-bit
and 64-bit word types, with 32-bit being the primary fast path on GPUs.

See https://github.com/scale-snu/cheddar-fhe
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
See https://github.com/scale-snu/cheddar-fhe
See [the Cheddar GitHub repository](https://github.com/scale-snu/cheddar-fhe)

let results = (outs Cheddar_Ciphertext:$output);
}

def Cheddar_MadUnsafeOp : Cheddar_Op<"mad_unsafe"> {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why "mad" and not "mac"?

// `GetRotKeyOp` stores a static distance attribute, but dynamic
// `ckks.rotate` lowering still needs a placeholder key op so the emitter can
// trace back to the `UserInterface`. This sentinel distance marks that case.
constexpr int64_t kDynamicRotationKeyDistanceSentinel = -1;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why you don't want to allow the GetRotKeyOp to take both dynamic and static options as well?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing this emitter makes me think we should invest some time into converting these emitters to use the emitc dialect, so that the vast majority of the codegen details can be avoided. 😰

Comment on lines +473 to +483
while (currentLevel > targetLevelVal) {
current = cheddar::RescaleOp::create(rewriter, op.getLoc(), cheddarCtType,
ctx.value(), current);
--currentLevel;
auto encodedOne = cheddar::EncodeConstantOp::create(
rewriter, op.getLoc(), cheddar::ConstantType::get(getContext()),
encoder.value(), one, rewriter.getI64IntegerAttr(currentLevel),
rewriter.getI64IntegerAttr(*logDefaultScale));
current =
cheddar::MultConstOp::create(rewriter, op.getLoc(), cheddarCtType,
ctx.value(), current, encodedOne);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment to explain why this needs a loop would be helpful.

// multiplies), so it maps directly onto CHEDDAR's logScale field.
int64_t logScale = invEncoding.getScalingFactor();

// lwe.rlwe_encode doesn't carry a plaintext level on main, so encode at
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has a RingAttr doesn't it? Isn't that support to encode the level via its RNS coefficient type? We might not instantiate those properly at the moment, but I know for sure the level analysis properly determines the needed levels of plaintexts involved in Pt-Ct ops, cf. encodeCleartextAsPlaintext. If there is a gap here please find/file the right GH issue and link it here with a TODO

target.addLegalDialect<cheddar::CheddarDialect>();
target.addIllegalDialect<ckks::CKKSDialect>();
target
.addIllegalOp<lwe::RLWEEncryptOp, lwe::RLWEDecryptOp, lwe::RLWEEncodeOp,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not make the LWE dialect illegal wholesale? Cheddar has its own types, after all.

return failure();
}

// Scaling factors on CKKS encodings are stored log-additively on main (a
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: comments talking about "on main" will become strange once they are merged into main.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants