✨ Render full RST need directives from @rst blocks#66
✨ Render full RST need directives from @rst blocks#66chrisjsewell wants to merge 1 commit intomainfrom
@rst blocks#66Conversation
## Summary
Enable users to embed complete RST need directives inside source code
comments delimited by `@rst ... @endrst` markers.
These blocks are parsed and rendered as real Sphinx-Needs nodes during
the Sphinx build, complementing the existing one-line marker support.
## Motivation
One-line markers (`@req{ID}`) are convenient for simple needs but cannot
express options, content bodies, or arbitrary directive fields.
By allowing full directive syntax inside `@rst` blocks, users can write
rich need items directly in source comments — including `:links:`,
`:status:`, content text, and any other field that `NeedDirective` accepts —
while still getting automatic source-tracing URLs.
## Changes
### `analyse/utils.py`
- **`ParsedDirective` TypedDict** — structured return type capturing
a directive's name, argument, options, content, line offsets, and
whether extra content exists outside the directive body.
- **`parse_single_directive()`** — regex-based parser that extracts
the first directive from an RST text block.
Returns `None` when the first non-blank line is not a directive.
### `sphinx_extension/directives/src_trace.py`
- **`generate_str_link_name()`** widened to accept `Metadata`
(the base class) instead of `OneLineNeed`, so it works for both
one-line needs and marked RST blocks.
- **`render_marked_rst_needs()`** — new method on
`SourceTracingDirective` that iterates `src_analyse.marked_rst`,
parses each block, injects local/remote URL options, constructs a
`NeedDirective` instance, and calls `.run()` to produce docutils
nodes.
- Called from `run()` after `render_needs()`.
### `tests/test_analyse_utils.py`
- Parametrised tests for `parse_single_directive` covering:
minimal directives, options, content bodies, multi-line content,
leading/trailing blanks, extra content detection, no-argument
directives, namespaced directive names, and `None`-return cases.
### `tests/test_src_trace.py` + `tests/doc_test/`
- **`rst_basic`** — integration test with a Sphinx project that uses
only `get_rst = true`. Source file contains a single `/* @rst … @endrst */`
block with an `.. impl::` directive (`RST_IMPL_1`). Verifies the need
node appears in the doctree snapshot.
- **`rst_mixed`** — integration test combining `get_oneline_needs` and
`get_rst` in the same build. Uses `[[ ]]` one-line markers to avoid
clashing with the `@rst` prefix. Source file contains both a one-line
need (`OL_IMPL_1`) and an RST block need (`RST_IMPL_2`). Verifies
both needs appear in the doctree snapshot.
## Design decisions
### Why instantiate `NeedDirective` directly?
`NeedDirective` uses `DummyOptionSpec` — a dummy spec that accepts all
options and keeps them as strings
([sphinx-needs source](https://github.com/useblocks/sphinx-needs/blob/df81a5c/sphinx_needs/directives/need.py#L49)).
`DummyOptionSpec` was introduced in **sphinx-needs v6**
([commit d09332d](useblocks/sphinx-needs@d09332d));
earlier versions use an explicit `option_spec` dict, so passing
arbitrary raw-string options would fail there.
**We should consider raising the minimum dependency to `sphinx-needs>=6`**
(currently `>=5,<9` in `pyproject.toml`).
`NeedDirective.run()` itself does its own key-by-key validation
(via a `match key:` block). This means:
- No option validation/conversion is needed before instantiation.
- Passing raw `dict[str, str | None]` is exactly what the directive expects.
Using `NeedDirective` directly (rather than `add_need()`) gives full
directive feature support: content body, arbitrary options, and
internal NeedDirective logic for `title_from_content`, `delete`, etc.
### Why a custom regex parser instead of docutils parsing?
The `parse_single_directive` function is a purposefully simple regex
parser scoped to the single-directive-per-block use case. Full
docutils RST parsing would be heavier and harder to control for this
constrained input.
## Comparison with MyST-Parser's directive handling
MyST-Parser's
[`run_directive`](https://github.com/executablebooks/MyST-Parser/blob/9364edb/myst_parser/mdit_to_docutils/base.py#L1684)
follows a more general pipeline:
1. **Directive class lookup** via `docutils.parsers.rst.directives.directive()`
— resolves any registered directive and warns on unknowns.
2. **`parse_directive_text()`** — a dedicated parser that validates options
against the directive's `option_spec` (type converters, unknown-key
detection), validates arguments against `required_arguments` /
`optional_arguments` / `final_argument_whitespace`, and supports both
YAML-delimited and RST-style (`:key: value`) option blocks.
3. **Mocked `state` / `state_machine`** (`MockState`, `MockStateMachine`)
— because MyST renders from markdown-it tokens, not from within a
real docutils state machine, it must mock these so directives can
call `nested_parse()`.
4. **Error wrapping** — `DirectiveError` and `MockingError` are caught
and converted to clean error nodes.
**What we don't need from this approach:**
- **Option validation** — `NeedDirective` uses `DummyOptionSpec` and
validates internally, so pre-validation would be redundant.
- **Directive class lookup** — we always target `NeedDirective`.
- **Mocked state** — we run inside a real `SphinxDirective.run()`,
so full docutils `state` / `state_machine` are already available.
**What we could adopt (potential TODOs):**
- [ ] **`content_offset` fallback** — when `content_line_offset` is
`None` the code falls back to `self.content_offset` (the enclosing
`.. src-trace::` directive's offset). This is semantically incorrect
(though harmless when there's no content). Consider using `0` or
adding a clarifying comment.
- [ ] **Bump minimum sphinx-needs to v6** — `DummyOptionSpec`
(which lets us pass arbitrary string options) was added in v6
([d09332d](useblocks/sphinx-needs@d09332d)).
The current constraint is `sphinx-needs>=5,<9`; without bumping it,
`render_marked_rst_needs` will break on sphinx-needs <6 where the
directive uses a fixed `option_spec`.
- [ ] **Per-line source tracking on `StringList`** — the current code
creates `StringList(content_lines, source=src_file)` which sets one
source for all lines. For richer error messages pointing to exact
lines within the RST block, per-line offset info could be added
(docutils `StringList` supports this via the `items` parameter).
- [ ] **Marker clashes between one-line parser and `@rst` blocks** —
when both `get_oneline_needs` and `get_rst` are enabled, comments
containing `@rst ... @endrst` may also be matched by the one-line
parser if its start sequence overlaps with `@rst` (e.g. the default
`@ ` prefix). Currently the analysis pipeline processes every
comment node through *both* extractors independently; there is no
mutual exclusion. This can produce spurious one-line needs or
validation errors from the `@rst` block's content being
misinterpreted as one-line fields. Consider adding a skip/guard so
that comments already claimed by `extract_marked_rst` are not also
fed to `extract_oneline_needs`, or document that users must choose
non-overlapping marker sequences.
## Key finding: `@rst` blocks require block comments
RST blocks must use C-style block comments (`/* @rst ... @endrst */`)
rather than `//` line comments in C/C++. Tree-sitter parses each `//`
line as a separate comment node, and `extract_rst()` needs both `@rst`
and `@endrst` within a single comment node's text. This is an
inherent constraint of the current tree-sitter based extraction and
should be documented for users.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #66 +/- ##
==========================================
- Coverage 90.06% 89.86% -0.21%
==========================================
Files 29 31 +2
Lines 2628 2772 +144
Branches 306 327 +21
==========================================
+ Hits 2367 2491 +124
- Misses 165 178 +13
- Partials 96 103 +7 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Investigation: Source mapping and directive parsing in docutils — implications for
|
| Code path | Works correctly? |
|---|---|
node.source, node.line = sm.get_source_and_line() inside nested SM |
✅ If StringList.items are set correctly |
self.reporter.warning(..., line=lineno) |
❌ Reporter uses root SM; lineno is in wrong coordinate space |
NeedDirective.run() calling self.state_machine.get_source_and_line(self.lineno) with parent SM |
❌ Uses parent SM with a lineno from the source file |
Nodes produced by nested_parse carrying source info |
✅ If StringList.items are correct |
Options
Option A: Fix StringList items, accept reporter limitation (recommended)
The most pragmatic approach. Fix the StringList construction so that nodes in the output have correct source attribution:
base_offset = marked_rst.source_map["start"]["row"] + parsed["content_line_offset"]
items = [(src_file, base_offset + i) for i in range(len(content_lines))]
content = StringList(content_lines, items=items)The reporter bug is a long-standing docutils issue that affects everyone (including Include in edge cases). For build-time warnings from NeedDirective, this is acceptable — especially since NeedDirective uses DummyOptionSpec and does its own validation, making reporter messages from option parsing unlikely.
Pros: Minimal change, fixes the most user-visible issue (node source in output), doesn't fight docutils internals.
Cons: Reporter warnings during build will have wrong line numbers for @rst block content.
Option B: Use insert_input with RST text injection
Instead of parsing the directive ourselves, inject the raw RST text (with URL options pre-injected as RST field-list syntax) into the parent state machine via insert_input. This means the lines live in the root SM's input_lines and the whole parse happens in the normal docutils flow:
# Hypothetical approach:
rst_text = marked_rst.rst
# Inject options directly into the RST text before parsing
rst_text = inject_option_line(rst_text, local_url_field, local_link_name)
rst_text = inject_option_line(rst_text, remote_url_field, remote_link_name)
textlines = statemachine.string2lines(rst_text)
self.state_machine.insert_input(textlines, src_file)
return [] # nodes inserted into doctree by state machine, not returnedPros: Eliminates the custom regex parser entirely. Full docutils parsing (multi-line arguments, complex indentation, etc.). Reporter line numbers work correctly. Source mapping works for everything.
Cons: You lose direct control over which directive class handles the text (it goes through normal directive resolution). Injecting options as RST text requires a small helper to insert :field: value lines at the right indentation. The return value is [] (nodes are inserted into the doctree by the state machine, not returned), which changes the control flow. Harder to enforce "only a single directive per block". The StringList constructor used by insert_input generates sequential offsets from 0 — to get correct source-file line offsets you'd need to bypass insert_input and splice a custom StringList with correct items directly into self.state_machine.input_lines.
Option C: Temporarily monkey-patch the reporter
Similar to how Sphinx's Include patches state_machine.insert_input to emit include-read events — temporarily redirect the reporter's get_source_and_line:
original = self.state.document.reporter.get_source_and_line
self.state.document.reporter.get_source_and_line = my_nested_sm.get_source_and_line
try:
result = need_directive.run()
finally:
self.state.document.reporter.get_source_and_line = originalPros: Reporter messages point to correct source lines.
Cons: Fragile. The reporter method would need to handle both nested content AND the main document simultaneously. Not worth the complexity for build-time warnings.
Recommendation
Option A — fix the StringList items and document the limitation. The PR's approach of directly instantiating NeedDirective is sound for this use case. The reporter line-number bug is a docutils-wide issue, not something we should try to work around. The important thing is that output nodes carry correct source attribution, which the items fix achieves.
If the custom regex parser ever becomes a maintenance burden (multi-line arguments, complex indentation edge cases), Option B is the path forward — but it requires rethinking the control flow since insert_input doesn't return nodes.
Other takeaways from Sphinx's Include
statemachine.string2lines()should be used for text normalization (tab expansion, whitespace conversion) instead of rawsplitlines()— a small improvement regardless of the larger approach.self.env.note_included()— Sphinx'sIncludecalls this for rebuild tracking. The PR already covers this vianote_dependencyon the whole source file, which is sufficient.- Sphinx's
include-readevent — an extensibility hook for transforming included text before parsing. Not needed now, but a good pattern if users ever want to preprocess@rstblocks.
Here's the section to append:
Source code references
docutils
StateMachine.insert_input()andViewList/StringList: docutils/statemachine.py —insert_inputat L385,get_source_and_lineat L358,ViewList.__init__(items construction) at L1071- docutils
Includedirective: docutils/parsers/rst/directives/misc.py —insert_into_input_linesat L249,custom_parseat L219 - RST parser states (reporter binding bug,
nested_parse,run_directive): docutils/parsers/rst/states.py —runtime_initreporter binding at L237,nested_parseat L282,run_directiveat L2243
Sphinx
- Sphinx
Includeoverride (path resolution,include-readevent patching): sphinx/directives/other.py L371–L419 - Sphinx
LiteralInclude(file reading, dependency tracking): sphinx/directives/code.py L414–L469
This PR
parse_single_directive()andStringListconstruction: analyse/utils.pyrender_marked_rst_needs()andNeedDirectiveinstantiation: sphinx_extension/directives/src_trace.py L339–L435
Plan: Replace
|
Summary
Enable users to embed complete RST need directives inside source code comments delimited by
@rst ... @endrstmarkers.These blocks are parsed and rendered as real Sphinx-Needs nodes during the Sphinx build, complementing the existing one-line marker support.
Motivation
One-line markers (
@req{ID}) are convenient for simple needs but cannot express options, content bodies, or arbitrary directive fields. By allowing full directive syntax inside@rstblocks, users can write rich need items directly in source comments — including:links:,:status:, content text, and any other field thatNeedDirectiveaccepts — while still getting automatic source-tracing URLs.Changes
analyse/utils.pyParsedDirectiveTypedDict — structured return type capturing a directive's name, argument, options, content, line offsets, and whether extra content exists outside the directive body.parse_single_directive()— regex-based parser that extracts the first directive from an RST text block. ReturnsNonewhen the first non-blank line is not a directive.sphinx_extension/directives/src_trace.pygenerate_str_link_name()widened to acceptMetadata(the base class) instead ofOneLineNeed, so it works for both one-line needs and marked RST blocks.render_marked_rst_needs()— new method onSourceTracingDirectivethat iteratessrc_analyse.marked_rst, parses each block, injects local/remote URL options, constructs aNeedDirectiveinstance, and calls.run()to produce docutils nodes.run()afterrender_needs().tests/test_analyse_utils.pyparse_single_directivecovering: minimal directives, options, content bodies, multi-line content, leading/trailing blanks, extra content detection, no-argument directives, namespaced directive names, andNone-return cases.tests/test_src_trace.py+tests/doc_test/rst_basic— integration test with a Sphinx project that uses onlyget_rst = true. Source file contains a single/* @rst … @endrst */block with an.. impl::directive (RST_IMPL_1). Verifies the need node appears in the doctree snapshot.rst_mixed— integration test combiningget_oneline_needsandget_rstin the same build. Uses[[ ]]one-line markers to avoid clashing with the@rstprefix. Source file contains both a one-line need (OL_IMPL_1) and an RST block need (RST_IMPL_2). Verifies both needs appear in the doctree snapshot.Design decisions
Why instantiate
NeedDirectivedirectly?NeedDirectiveusesDummyOptionSpec— a dummy spec that accepts all options and keeps them as strings(sphinx-needs source).
DummyOptionSpecwas introduced in sphinx-needs v6 (commit d09332d); earlier versions use an explicitoption_specdict, so passing arbitrary raw-string options would fail there.We should consider raising the minimum dependency to
sphinx-needs>=6(currently>=5,<9inpyproject.toml).NeedDirective.run()itself does its own key-by-key validation (via amatch key:block). This means:dict[str, str | None]is exactly what the directive expects.Using
NeedDirectivedirectly (rather thanadd_need()) gives full directive feature support: content body, arbitrary options, and internal NeedDirective logic fortitle_from_content,delete, etc.Why a custom regex parser instead of docutils parsing?
The
parse_single_directivefunction is a purposefully simple regex parser scoped to the single-directive-per-block use case. Full docutils RST parsing would be heavier and harder to control for this constrained input.Comparison with MyST-Parser's directive handling
MyST-Parser's
run_directivefollows a more general pipeline:docutils.parsers.rst.directives.directive()— resolves any registered directive and warns on unknowns.parse_directive_text()— a dedicated parser that validates options against the directive'soption_spec(type converters, unknown-key detection), validates arguments againstrequired_arguments/optional_arguments/final_argument_whitespace, and supports both YAML-delimited and RST-style (:key: value) option blocks.state/state_machine(MockState,MockStateMachine) — because MyST renders from markdown-it tokens, not from within a real docutils state machine, it must mock these so directives can callnested_parse().DirectiveErrorandMockingErrorare caught and converted to clean error nodes.What we don't need from this approach:
NeedDirectiveusesDummyOptionSpecand validates internally, so pre-validation would be redundant.NeedDirective.SphinxDirective.run(), so full docutilsstate/state_machineare already available.What we could adopt (potential TODOs):
content_offsetfallback — whencontent_line_offsetisNonethe code falls back toself.content_offset(the enclosing.. src-trace::directive's offset). This is semantically incorrect (though harmless when there's no content). Consider using0or adding a clarifying comment.DummyOptionSpec(which lets us pass arbitrary string options) was added in v6 (d09332d). The current constraint issphinx-needs>=5,<9; without bumping it,render_marked_rst_needswill break on sphinx-needs <6 where the directive uses a fixedoption_spec.StringList— the current code createsStringList(content_lines, source=src_file)which sets one source for all lines. For richer error messages pointing to exact lines within the RST block, per-line offset info could be added (docutilsStringListsupports this via theitemsparameter).@rstblocks — when bothget_oneline_needsandget_rstare enabled, comments containing@rst ... @endrstmay also be matched by the one-line parser if its start sequence overlaps with@rst(e.g. the default@prefix). Currently the analysis pipeline processes every comment node through both extractors independently; there is no mutual exclusion. This can produce spurious one-line needs or validation errors from the@rstblock's content being misinterpreted as one-line fields. Consider adding a skip/guard so that comments already claimed byextract_marked_rstare not also fed toextract_oneline_needs, or document that users must choose non-overlapping marker sequences.Key finding:
@rstblocks require block commentsRST blocks must use C-style block comments (
/* @rst ... @endrst */) rather than//line comments in C/C++. Tree-sitter parses each//line as a separate comment node, andextract_rst()needs both@rstand@endrstwithin a single comment node's text. This is an inherent constraint of the current tree-sitter based extraction and should be documented for users.