Skip to content
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -375,7 +375,7 @@ sb run [--config-file]
| `--host-list` `-l` | `None` | Comma separated host list. |
| `--host-password` | `None` | Host password or key passphrase if needed. |
| `--host-username` | `None` | Host username if needed. |
| `--no-docker` | `False` | Run on host directly without Docker. |
| `--no-docker` | `False` | Run on host directly without Docker. When using remote nodes, SuperBench (`sb` binary and dependencies) must be pre-installed on each target host; otherwise `command not found` will occur. See [Run SuperBench - Using --no-docker on Remote Nodes](getting-started/run-superbench.md#using---no-docker-on-remote-nodes) for details. |
Comment thread
guoshzhao marked this conversation as resolved.
Outdated
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For readability in this table cell, consider formatting the literal error text as code (e.g., command not found) and/or shortening the row by moving the longer explanation into the linked getting-started section. Very long table cells can make the markdown harder to maintain and review.

Suggested change
| `--no-docker` | `False` | Run on host directly without Docker. When using remote nodes, SuperBench (`sb` binary and dependencies) must be pre-installed on each target host; otherwise `command not found` will occur. See [Run SuperBench - Using --no-docker on Remote Nodes](getting-started/run-superbench.md#using---no-docker-on-remote-nodes) for details. |
| `--no-docker` | `False` | Run on host directly without Docker. See [Run SuperBench - Using --no-docker on Remote Nodes](getting-started/run-superbench.md#using---no-docker-on-remote-nodes) for details on using this option with remote nodes. |

Copilot uses AI. Check for mistakes.
| `--output-dir` | `None` | Path to output directory, outputs/{datetime} will be used if not specified. |
| `--private-key` | `None` | Path to private key if needed. |

Expand Down
15 changes: 15 additions & 0 deletions docs/getting-started/run-superbench.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,3 +47,18 @@ You can create a privileged container with `superbench/superbench` image, skip `
`sb run --no-docker -l localhost -c resnet.yaml`.

:::

## Using `--no-docker` on Remote Nodes
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[SHOULD-FIX] (Maintainability) New section ignores the file's paragraph + bash-fence + admonition style

Issue: Every other prose unit in this file follows "1 short paragraph → ```bash fenced example → optional :::note / :::tip admonition" (## Deploy, ## Run). Confirmed by git grep -nE ':::(tip|note|caution|warning|info)' -- docs/:

docs/getting-started/installation.mdx:17::::tip Tips
docs/getting-started/installation.mdx:32::::note
docs/getting-started/installation.mdx:61::::note Note
docs/getting-started/run-superbench.md:27::::note Note
docs/getting-started/run-superbench.md:44::::tip TIP
docs/user-tutorial/baseline-generation.md:31::::tip Tips
docs/user-tutorial/result-summary.md:31::::tip Tips

The new section uses none — it is one dense 4-item bold-led numbered list, zero runnable code blocks, no admonitions. Items 1 (command not found, exit code 127) and 4 (HPC clusters with restricted container runtimes) are textbook :::caution / :::note material; Options A/B/C are command-driven yet show no commands.

Impact: Future editors will see one section that looks foreign to the rest of the page, increasing drift over time.

Recommendation: Restructure as paragraphs + 1–2 bash code fences + admonitions, e.g.:

### Using `--no-docker` on Remote Nodes

When you run `sb run --no-docker` against remote hosts (via `--host-file` or
`--host-list`), Ansible SSHes into each node and invokes the `sb` binary
directly, so SuperBench must already be installed on every target host.

\`\`\`bash
sb run --no-docker -f remote.ini -c resnet.yaml \\
  --config-override superbench.env.SB_MICRO_PATH=/opt/superbench
\`\`\`

:::caution
If `sb` is not on `PATH` on a remote host, the run fails with
`sb: command not found` (exit code 127).
:::

:::note
Set `SB_MICRO_PATH` (env var or `superbench.env.SB_MICRO_PATH` via
`--config-override`) to the on-host install path of the micro-benchmark
binaries.
:::

Agreement: 2/3 reviewers.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[SHOULD-FIX] (Maintainability) New section should be H3 under ## Run, not a peer H2

Issue: This document's existing H2 structure is the top-level workflow narrative (## Deploy## Run). The new section documents requirements for one specific variant of the sb run step and is logically a Run-time caveat — the existing :::tip TIP for the same flag is already correctly nested under ## Run. Adding it as a third peer H2 implies it is a separate workflow stage and splits --no-docker guidance across two sibling H2s.

Impact: Readers and the Docusaurus sidebar/TOC will surface "Using --no-docker on Remote Nodes" as a peer to Deploy/Run.

Recommendation: Demote to H3 under ## Run, immediately after (or merged into) the existing :::tip TIP:

## Run
...
:::tip TIP
... (existing local privileged-container note) ...
:::

### Using `--no-docker` on Remote Nodes
...

Agreement: 1/3 reviewers.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[NON-BLOCKING] (Maintainability) No cross-reference between the pre-existing :::tip TIP and the new section

Issue: Two adjacent blocks on the same flag with no narrative linkage (tip = local privileged container; new section = remote hosts). Not strictly contradictory, but future maintainers may update one and forget the other.

Recommendation: After demoting the new section to H3 under ## Run (sibling finding), add a leading sentence: "The tip above covers running --no-docker locally inside a privileged container. The requirements below apply when --no-docker is used against remote hosts."

Agreement: 1/3 reviewers.


When running `sb run` with `--no-docker` on **remote nodes** (via `--host-file` or `--host-list`), the following requirements apply:

1. **SuperBench must be pre-installed on each remote node.** The `sb` CLI binary and its dependencies must be available in the PATH on every target host. Running without Docker means Ansible will SSH into each node and execute `sb exec` directly; if `sb` is not installed, you will see `command not found` (exit code 127).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[NON-BLOCKING] (Maintainability) "execute sb exec directly" leaks an internal subcommand

Issue: sb exec is the actual remote command (superbench/runner/runner.py:127, superbench/runner/playbooks/cleanup.yaml:5-7) but it is not documented as a user-facing command in docs/cli.md (which lists sb deploy, sb run, sb result …, etc.). Exposing it by name without explanation couples user docs to an implementation detail of runner.py — a future rename will silently rot this line.

Note: This conflicts with the suggestion to name sb exec in cli.md for diagnostic clarity (see the docs/cli.md line 378 comment about the failing binary). Reconcile by: in cli.md, say "the failing binary is sb"; in run-superbench.md, drop the sb exec reference and say "invokes the sb binary directly".

Recommendation: Rephrase, e.g.: "Ansible SSHes into each node and invokes the sb binary directly; if sb is not on PATH you will see sb: command not found (exit code 127)."

Agreement: 1/3 reviewers.


2. **Deployment options:**
- **Option A:** Extract the contents of the `superbench/superbench` Docker image onto each node (e.g., copy binaries, Python environment, and micro-benchmark executables to a consistent path), then ensure `sb` is in PATH.
- **Option B:** Install SuperBench from source or pip on each node, and build/install the required micro-benchmark binaries (see `third_party/` and build instructions).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[SHOULD-FIX] (Correctness) third_party/ + "build instructions" pointer is unactionable

Issue: third_party/Makefile does exist (targets cuda, rocm, common, cuda_cutlass, …, keyed off SB_MICRO_PATH, MPI_HOME, HIP_HOME, CUDA_VER), but there is no user-facing "build third_party on the host" guide in docs/. docs/getting-started/installation.mdx documents only the control-node build (pip install . && make postinstall). A user following "see third_party/ and build instructions" lands on a Makefile with no surrounding guidance and no required env-var documentation.

Impact: Option B cannot be reproduced from this doc alone, undermining the PR's goal of preventing rc=127.

Recommendation: Replace with a concrete pointer + required vars, e.g.:

   - **Option B:** On each node, install SuperBench (see
     [installation](installation.mdx)) and then build the native
     micro-benchmark binaries with the project Makefile:
     \`\`\`bash
     export SB_MICRO_PATH=/opt/superbench  # must match the value used at runtime
     cd third_party && make -j cuda   # or `make rocm` on AMD
     \`\`\`
     The supported variables (`SB_MICRO_PATH`, `MPI_HOME`, `HIP_HOME`,
     `CUDA_VER`, …) are defined at the top of `third_party/Makefile`.

Agreement: 2/3 reviewers.

- **Option C:** Use `sb deploy` first to pull the image, then manually extract the container filesystem to the host if you need to run without containers.
Comment thread
guoshzhao marked this conversation as resolved.
Outdated

3. **Environment variables:** Set `superbench.env.SB_MICRO_PATH` (and other required env vars) to match the installation path on each node when using `--no-docker`.
Comment thread
guoshzhao marked this conversation as resolved.
Outdated

4. **Use case:** `--no-docker` is intended for environments where Docker-in-Docker or nested containers are not supported (e.g., certain Kubernetes setups, HPC clusters with restricted container runtimes). For standard deployments, prefer `sb deploy` + `sb run` without `--no-docker`.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[NON-BLOCKING] (Maintainability) No back-link from run-superbench.md to cli.md

Issue: cli.md now links forward to this new section, but the new section never references cli.md. Other tutorial pages routinely back-link to the canonical flag table:

docs/user-tutorial/baseline-generation.md:17: ... [SuperBench CLI](../cli.md).
docs/user-tutorial/result-summary.md:17:   ... [SuperBench CLI](../cli.md).
docs/user-tutorial/data-diagnosis.md:17:   ... [SuperBench CLI](../cli.md).

Readers landing on the new section have no pointer back to --host-file, --host-list, --config-override (all referenced by name).

Recommendation: Add a one-liner at the top or bottom of the new section:

For the full list of flags accepted by `sb run`, see [SuperBench CLI](../cli.md#sb-run).

Agreement: 2/3 reviewers.

Loading