Skip to content

Add module-vendor mode to bashkit-coreutils-port #1533

@chaliy

Description

@chaliy

Background

bashkit-coreutils-port currently has one mode: rewrite a uu_app() clap definition into a generated <util>_args.rs (#1529). For reuse beyond argument surfaces — e.g. swapping printf.rs's handwritten format-spec parser for uucore's — we need a second mode that vendors entire library modules from uucore.

Verification on a probe branch ruled out depending on uucore at runtime:

Metric Baseline uucore runtime dep Delta
bashkit-cli size 24,260,000 B 24,269,344 B +9 KB (declared, unused)
Cold release build time 1m 37s 3m 15s +98 s
wasm32-unknown-unknown build ✅ pass ❌ fail (uucore → rustix → errno) hard blocker

The format module inside uucore is platform-clean — its imports are std, bigdecimal, num-traits, unit-prefix, os_display, plus a small number of uucore-internal types (extendedbigdecimal, UError, NonUtf8OsStrError). The drag comes from other parts of uucore that get pulled in via the parent crate, not from the modules themselves.

Conclusion: vendor selected modules at port time using the same machinery (and drift CI) that the args path uses today. This issue covers the tool capability only; actual vendoring of uucore::format and the printf migration is the first user of this capability and lives in #1534.

Acceptance criteria

Tool surface

  • bashkit-coreutils-port gains a port-module invocation (subcommand or distinct CLI verb) accepting:
    • uutils source dir
    • module path under src/uucore/src/lib/features/<name>/
    • output dir (under crates/bashkit/src/builtins/generated/<name>/)
    • a small config (TOML or inline flags) declaring per-import substitutions: (internal_path, action) where action is one of inline (copy the type's defining file in too), replace_with(local_path) (rewrite the import to a bashkit-side equivalent), or error (abort the port — surface unexpected internal references rather than silently emit).
  • The tool's existing args mode is unchanged.
  • Output files carry the same banner shape as args output: // GENERATED by bashkit-coreutils-port. DO NOT EDIT. + uutils rev + regenerate command + MIT attribution.
  • Verbatim copy by default; a future enhancement may add a syn-based rewriter for modules that need it. format/ does not — recommend landing this issue with verbatim-copy only and adding rewrite when the first module demands it.
  • Fluent / i18n imports trigger a hard error: unresolved import: 'fluent::*' or similar, telling the operator the module is not safely vendorable without code changes.
  • Module-vendoring metadata lives in a single manifest (e.g. crates/bashkit-coreutils-port/vendored.toml) listing each vendored module + its source path + uutils revision. Drift workflow reads from this manifest; adding a new vendored module is one TOML stanza.

CI

  • .github/workflows/coreutils-args-drift.yml extends to also iterate the manifest and re-run port-module for each vendored module against uutils HEAD. The same auto-PR fires when args or module output diffs. Single PR per drift run, not one PR per module.

Documentation

  • specs/coreutils-args-port.md gains a "Module mode" section:
    • When to use it (vendoring small platform-clean modules) vs args mode (rewriting uu_app()).
    • The substitution model for uucore-internal types.
    • The manifest format and drift-CI integration.
  • crates/bashkit-coreutils-port/src/main.rs module-level doc updated to reflect the second mode.
  • specs/maintenance.md § Coreutils Argument-Surface Drift broadens to cover module-mode review, not just args.

Out of scope

  • Actually vendoring uucore::format and migrating printf.rs. That is the first consumer and lives in Vendor uucore::format and migrate printf.rs to use it #1534.
  • Vendoring ranges, parser-num, version-cmp, quoting-style, etc. — file separate consumer issues per module on demand.
  • A syn-based module rewriter (only added when a module needs it).

Open questions

  • Manifest location. crates/bashkit-coreutils-port/vendored.toml (next to the tool) vs crates/bashkit/src/builtins/generated/vendored.toml (next to the output). Recommend the former — the tool owns it, the workflow only reads it.
  • License attribution. Extend the existing THIRD_PARTY_LICENSES entry that covers args codegen with a module list, vs new entry. Recommend extending — same upstream, same MIT terms.
  • Internal-type substitution UX. Verbose TOML vs declarative comments inside the source like // codegen: substitute uucore::error::UError -> crate::error::Error. Recommend the TOML for centralization; the substitution rules are policy, not source.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions