Make retags an implicit part of typed copies#154341
Make retags an implicit part of typed copies#154341RalfJung wants to merge 1 commit intorust-lang:mainfrom
Conversation
This comment has been minimized.
This comment has been minimized.
dbabc07 to
c5a3e40
Compare
This comment has been minimized.
This comment has been minimized.
c5a3e40 to
df515dd
Compare
|
@bors try |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Make retags an implicit part of typed copies
This comment has been minimized.
This comment has been minimized.
df515dd to
76c8c9d
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment was marked as outdated.
This comment was marked as outdated.
|
Looks like enabling validation of references just to keep retags working in const-eval was not a good idea... |
76c8c9d to
d79e607
Compare
|
@bors try |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Make retags an implicit part of typed copies
This comment has been minimized.
This comment has been minimized.
ce09dd9 to
3e548e6
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
3e548e6 to
9886c61
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
9886c61 to
d72567a
Compare
|
This PR was rebased onto a different main commit. Here's a range-diff highlighting what actually changed. Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers. |
There was a problem hiding this comment.
I'll investigate this after this PR lands, there are some easy cases, and some annoying ones, so I'll start by keeping around whether the copied operands had retags or not
| // operation for it's alias tracking. It would be wrong for `into_raw_with_allocator` to | ||
| // do the same as that would induce uniqueness assumptions that we only want with | ||
| // the default allocator. | ||
| &mut **b |
There was a problem hiding this comment.
this now also goes through an intermediate mutable reference, why is that?
There was a problem hiding this comment.
It is so that in MIR there ends up being a reference-to-raw-ptr-cast, and that is what Stacked Borrows recognizes. I'll extend the comment.
|
I have a small concern. After the patch, must we be careful about whether to retag or not when creating a new copy? Can we always create a non-retag for transformation? IIUC, the answer is no. This will make Miri much less useful. This can make a transformation pass more confusing. In LLVM, we can just drop UB via dropUBImplyingAttrsAndMetadata. |
|
It's always safe to turn a retagging copy into a non-retag one. So
I think the opposite is true. Currently the optimizations and Miri live in a different world, making it impossible to actually reason about what optimizations do in a way that is coherent with Miri. We'll need to either turn on retags by default or make them implicit in MIR to move mir-opts and Miri into the same world; only then can Miri be useful to check concrete optimization examples. |
d72567a to
f790c9c
Compare
|
@bors try jobs=x86_64-gnu-aux |
This comment has been minimized.
This comment has been minimized.
Make retags an implicit part of typed copies try-job: x86_64-gnu-aux
Hmm, I agree with this PR, but I mean, for example, if I write a pass that turns all retags to non-retags, then Miri cannot have retags. Or do I misunderstand the semantics of retag? I haven't read through Miri, but I think the retag information is important to Miri. To avoid this, we have to be careful when turning retag to non-retag. |
|
Such a pass would remove UB. So, the pass is valid, but it may mean that later optimization passes work less well because they need the retags. The retag information is important to Miri to detect UB. But it's also normal that optimization passes sometimes remove UB. For instance if the code contains a |
|
@bors r=oli-obk |
|
⌛ Testing commit f790c9c with merge 4b76db7... Workflow: https://github.com/rust-lang/rust/actions/runs/25135311557 |
Make retags an implicit part of typed copies Ever since Stacked Borrows was first implemented in Miri, that was done with `Retag` statements: given a place (usually a local variable), those statements find all references stored inside the place and refresh their tags to ensure the aliasing requirements are upheld. However, this is a somewhat unsatisfying approach for multiple reasons: - It leaves open the [question](rust-lang/unsafe-code-guidelines#371) of where to even put `Retag` statements. Over time, the AddRetag pass settled on one possible answer to this, but it wasn't very canonical. - For assignments of the form `*ptr = expr`, if the assignment involves copying a reference, we probably want to do a retag -- but if we do a `Retag(*ptr)` as the next instruction, it can be non-trivial to argue that this even retags the right value, so we refrained from doing retags in that case. This has [come up](llvm/llvm-project#160913 (comment)) as a potential issue for Rust making better use of LLVM "captures" annotations. (That said, there might be [other ways](rust-lang/unsafe-code-guidelines#593 (comment)) to obtain this desired optimization.) - Normal compilation avoids generating retags, but we still generate LLVM IR with `noalias`. What does that even mean? How do MIR optimization passes interact with retags? These are questions we have to figure out to make better use of aliasing information, but currently we can't even really ask such questions. I think we should resolve all that by making retags part of what happens during a typed copy (a concept and interpreter infrastructure that did not exist yet when retags were initially introduced). Under this proposal, when executing a MIR assignment statement, what conceptually happens is as follows: - We evaluate the LHS to a place. - We evaluate the RHS to a value. This does a typed load from memory if needed, raising UB if memory does not contain a valid representation of the assignment's type. - We walk that value, identify all references inside of it, and retag them. If this happens as part of passing a function argument, this is a protecting retag. - We store (a representation of) the value into the place. However, this semantics doesn't fully work: there's a mandatory MIR pass that turns expressions like `&mut ***ptr` into intermediate deref's. Those must *not* do any retags. So far this happened because the AddRetag pass did not add retags for assignments to deref temporaries, but that information is not recorded in cross-crate MIR. Therefore I instead added a field to `Rvalue::Use` to indicate whether this value should be retagged or not. A non-retagging copy seems like a sufficiently canonical primitive that we should be able to express it. Dealing with the fallout from that is a large chunk of the overall diff. (I also considered adding this field to `StatementKind::Assign` instead, but decided against that as we only actually need it for `Rvalue::Use`. I am not sure if this was the right call...) This neatly answers the question of when retags should occur, and handles cases like `*ptr = expr`. It avoids traversing values twice in Miri. It makes codegen's use of `noalias` sound wrt the actual MIR that it is working on. It also gives us a target semantics to evaluate MIR opts against. However, I did not carefully check all MIR opts -- in particular, GVN needs a thorough look under the new semantics; it currently can turn alias-correct code into alias-incorrect code. (But this PR doesn't make things any worse for normal compilation where the retag indicator is anyway ignored.) Another side-effect of this PR is that `-Zmiri-disable-validation` now also disables alias checking. It'd be nicer to keep them orthogonal but I find this an acceptable price to pay. - [rustc benchmark results](#154341 (comment)) - [miri benchmark results](#154341 (comment))
|
@bors yield |
|
Auto build was cancelled. Cancelled workflows: The next pull request likely to be tested is #155979. |
View all comments
Ever since Stacked Borrows was first implemented in Miri, that was done with
Retagstatements: given a place (usually a local variable), those statements find all references stored inside the place and refresh their tags to ensure the aliasing requirements are upheld. However, this is a somewhat unsatisfying approach for multiple reasons:Retagstatements. Over time, the AddRetag pass settled on one possible answer to this, but it wasn't very canonical.*ptr = expr, if the assignment involves copying a reference, we probably want to do a retag -- but if we do aRetag(*ptr)as the next instruction, it can be non-trivial to argue that this even retags the right value, so we refrained from doing retags in that case. This has come up as a potential issue for Rust making better use of LLVM "captures" annotations. (That said, there might be other ways to obtain this desired optimization.)noalias. What does that even mean? How do MIR optimization passes interact with retags? These are questions we have to figure out to make better use of aliasing information, but currently we can't even really ask such questions.I think we should resolve all that by making retags part of what happens during a typed copy (a concept and interpreter infrastructure that did not exist yet when retags were initially introduced). Under this proposal, when executing a MIR assignment statement, what conceptually happens is as follows:
However, this semantics doesn't fully work: there's a mandatory MIR pass that turns expressions like
&mut ***ptrinto intermediate deref's. Those must not do any retags. So far this happened because the AddRetag pass did not add retags for assignments to deref temporaries, but that information is not recorded in cross-crate MIR. Therefore I instead added a field toRvalue::Useto indicate whether this value should be retagged or not. A non-retagging copy seems like a sufficiently canonical primitive that we should be able to express it. Dealing with the fallout from that is a large chunk of the overall diff. (I also considered adding this field toStatementKind::Assigninstead, but decided against that as we only actually need it forRvalue::Use. I am not sure if this was the right call...)This neatly answers the question of when retags should occur, and handles cases like
*ptr = expr. It avoids traversing values twice in Miri. It makes codegen's use ofnoaliassound wrt the actual MIR that it is working on. It also gives us a target semantics to evaluate MIR opts against. However, I did not carefully check all MIR opts -- in particular, GVN needs a thorough look under the new semantics; it currently can turn alias-correct code into alias-incorrect code. (But this PR doesn't make things any worse for normal compilation where the retag indicator is anyway ignored.)Another side-effect of this PR is that
-Zmiri-disable-validationnow also disables alias checking. It'd be nicer to keep them orthogonal but I find this an acceptable price to pay.