Skip to content

Embed internal module and builtin function bytecode in --compile --bytecode executables#28461

Draft
sosukesuzuki wants to merge 10 commits intomainfrom
claude/compile-builtin-bytecode
Draft

Embed internal module and builtin function bytecode in --compile --bytecode executables#28461
sosukesuzuki wants to merge 10 commits intomainfrom
claude/compile-builtin-bytecode

Conversation

@sosukesuzuki
Copy link
Copy Markdown
Contributor

@sosukesuzuki sosukesuzuki commented Mar 23, 2026

bun build --compile --bytecode already embeds bytecode for user code.
This PR extends it to also embed bytecode for Bun's internal JavaScript:

  • Internal modules (src/js/node/, src/js/bun/, etc.) — 138 modules
    like node:fs, node:http
  • Builtin functions (src/js/builtins/) — 362 functions like
    ReadableStream internals, console methods

At runtime, require("node:fs") or new ReadableStream() can decode the
embedded bytecode instead of parsing the source.

Benchmarks

Lightweight app (streams + 6 modules)

Wall time init_ms
before 30.0ms ~8.7ms
after 20.5ms ~3.0ms
speedup 1.46× 2.9×

Large CLI app (startup)

Wall time User time
before 263.4ms 375.5ms
after 245.9ms 349.8ms
delta -17.5ms (6.6%) -25.7ms (6.8%)

Binary size

Standalone executable grows by ~7MB (internal modules: +6.1MB, builtin
functions: +1.1MB). The Bun binary itself is unchanged.

How it works

Build time (bun build --compile --bytecode):

  1. generateBuiltinBytecodes loops over all internal module / builtin
    function sources (via codegen-generated Bun__getInternalModuleSource /
    Bun__getBuiltinFunctionSource extern functions)
  2. Each source is parsed and bytecode-generated via new JSC API
    recursivelyGenerateUnlinkedCodeBlockForBuiltinFunction +
    encodeBuiltinFunctionExecutable
  3. Serialized into the .bun section as 128-byte-aligned entries with
    (kind, id, bytecode) tuples

Runtime (standalone executable):

  1. fromBytes deserializes the BuiltinBytecodeEntry[] table
  2. InternalModuleRegistry::generateModule / DEFINE_BUILTIN_EXECUTABLES
    macro first try Bun__findInternalModuleBytecode /
    Bun__findBuiltinFunctionBytecode
  3. If found, decodeBuiltinFunctionExecutable restores the executable
    without parsing; otherwise falls back to the existing parse path

Constraints

  • Disabled for cross-compilation (bytecode is arch-dependent)
  • Debug builds skip bytecode generation (sources are loaded from disk via
    BUN_DYNAMIC_JS_LOAD_PATH)

Depends on WebKit PR: oven-sh/WebKit#177

@robobun
Copy link
Copy Markdown
Collaborator

robobun commented Mar 23, 2026

Updated 1:17 PM PT - Mar 26th, 2026

@sosukesuzuki, your commit fe95893 has 4 failures in Build #42347 (All Failures):


🧪   To try this PR locally:

bunx bun-pr 28461

That installs a local version of the PR into your bun-28461 executable, so you can run:

bun-28461 --bun


// Standalone executables built with --bytecode embed pre-generated bytecode
// for internal modules. Try decoding that first to skip parsing.
if (void* bunVM = defaultGlobalObject(globalObject)->bunVM()) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we set an extern "C" bool that says "please check for this here"

// Builtin bytecode generation needs Bun's private identifiers (@isCallable etc)
// registered in the VM. The plain getVMForBytecodeCache() VM doesn't have
// JSVMClientData, so parsing builtin sources would fail.
static JSC::VM& getVMForBuiltinBytecodeCache()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we always use this instead of using either?

Copy link
Copy Markdown
Collaborator

@Jarred-Sumner Jarred-Sumner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this idea

…he VM

- Add Bun__hasEmbeddedBuiltinBytecode so C++ can skip the extern lookup
  call entirely on normal Bun runs. The Zig lookup functions now read
  the global StandaloneModuleGraph instance instead of threading bunVM
  through every call site.
- Fold getVMForBuiltinBytecodeCache into getVMForBytecodeCache by always
  attaching JSVMClientData. Module/CJS codegen doesn't need it but works
  fine with it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants