Write the `SeqCst` section

pull/378/head
SabrinaJewson 3 years ago
parent 715e67ffbe
commit afe0ee2bf0
No known key found for this signature in database
GPG Key ID: 3D5438FFA5F05564

@ -1 +1,390 @@
# SeqCst
`SeqCst` is probably the most interesting ordering, because it is simultaneously
the simplest and most complex atomic memory ordering in existence. Its
simple, because if you do only use `SeqCst` everywhere then you can kind of
maybe pretend like the Abstract Machine has a concept of time; phrases like
“latest value” make sense, the program can be thought of as a set of steps that
interleave, there is a universal “now” and “before” and wouldnt that be nice?
But its also the most complex, because as soon as look under the hood you
realize just how incredibly convoluted and hard to follow the actual rules
behind it are, and it gets really ugly really fast as soon as you try to mix it
with any other ordering.
To understand `SeqCst`, we first have to understand the problem it exists to
solve. The first complexity is that this problem can only be observed in the
presence of at least four different threads _and_ two separate atomic variables;
anything less and its not possible to notice a difference. The common example
used to show where weaker orderings produce counterintuitive results is this:
```rust
# use std::sync::atomic::{self, AtomicBool};
use std::thread;
// Set this to Relaxed, Acquire, Release, AcqRel, doesnt matter — the result is
// the same (modulo panics caused by attempting acquire stores or release
// loads).
const ORDERING: atomic::Ordering = atomic::Ordering::Relaxed;
static X: AtomicBool = AtomicBool::new(false);
static Y: AtomicBool = AtomicBool::new(false);
let a = thread::spawn(|| { X.store(true, ORDERING) });
let b = thread::spawn(|| { Y.store(true, ORDERING) });
let c = thread::spawn(|| { while !X.load(ORDERING) {} Y.load(ORDERING) });
let d = thread::spawn(|| { while !Y.load(ORDERING) {} X.load(ORDERING) });
let a = a.join().unwrap();
let b = b.join().unwrap();
let c = c.join().unwrap();
let d = d.join().unwrap();
# return;
// This assert is allowed to fail.
assert!(c || d);
```
The basic setup of this code, for all of its possible executions, looks like
this:
```text
a static X c d static Y b
╭─────────╮ ┌───────┐ ╭─────────╮ ╭─────────╮ ┌───────┐ ╭─────────╮
│ store X ├─┐ │ false │ ┌─┤ load X │ │ load Y ├─┐ │ false │ ┌─┤ store Y │
╰─────────╯ │ └───────┘ │ ╰────╥────╯ ╰────╥────╯ │ └───────┘ │ ╰─────────╯
└─┬───────┐ │ ╭────⇓────╮ ╭────⇓────╮ │ ┌───────┬─┘
│ true ├─┘ │ load Y ├─? ?─┤ load X │ └─┤ true │
└───────┘ ╰─────────╯ ╰─────────╯ └───────┘
```
In other words, `a` and `b` are guaranteed to, at some point, store `true` into
`X` and `Y` respectively, and `c` and `d` are guaranteed to, at some point, load
those values of `true` from `X` and `Y` (there could also be an arbitrary number
of loads of `false` by `c` and `d`, but theyve been omitted since they dont
actually affect the execution at all). The question now is when `c` and `d` load
from `Y` and `X` respectively, is it possible for them _both_ to load `false`?
And looking at this diagram, theres absolutely no reason why not. There isnt
even a single arrow connecting the left and right hand sides so far, so the load
has no restrictions on which value it is allowed to pick — and this goes for
both sides equally, so we could end up with an execution like this:
```text
a static X c d static Y b
╭─────────╮ ┌───────┐ ╭─────────╮ ╭─────────╮ ┌───────┐ ╭─────────╮
│ store X ├─┐ │ false ├┐ ┌┤ load X │ │ load Y ├┐ ┌┤ false │ ┌─┤ store Y │
╰─────────╯ │ └───────┘│ │╰────╥────╯ ╰────╥────╯│ │└───────┘ │ ╰─────────╯
└─┬───────┐└─│─────║──────┐┌─────║─────│─┘┌───────┬─┘
│ true ├──┘╭────⇓────╮┌─┘╭────⇓────╮└──┤ true │
└───────┘ │ load Y ├┘└─┤ load X │ └───────┘
╰─────────╯ ╰─────────╯
```
Which results in a failed assert. This execution is brought about because the
model of separate modification orders means that there is no relative ordering
between `X` and `Y` being changed, and so each thread is allowed to “see” either
order. However, some algorithms will require a globally agreed-upon ordering,
and this is where `SeqCst` can come in useful.
This ordering, first and foremost, inherits the guarantees from all the other
orderings — it is an acquire operation for loads, a release operation for stores
and an acquire-release operation for RMWs. In addition to this, it gives some
guarantees unique to `SeqCst` about what values it is allowed to load. Note that
these guarantees are not about preventing data races: unless you have some
unrelated code that triggers a data race given an unexpected condition, using
`SeqCst` can only prevent you from race conditions because its guarantees only
apply to other `SeqCst` operations rather than all data accesses.
## S
`SeqCst` is fundamentally about _S_, which is the global ordering of all
`SeqCst` operations in an execution of the program. It is consistent between
every atomic and every thread, and all stores, fences and RMWs that use a
sequentially consistent ordering have a place in it (but no other operations
do). It is in contrast to modification orders, which are similarly total but
only scoped to a single atomic rather than the whole program.
Other than an edge case involving `SeqCst` mixed with weaker orderings (detailed
in the next section), _S_ is primarily controlled by the happens before
relations in a program: this means that if an action _A_ happens before an
action _B_, it is also guaranteed to appear before _B_ in _S_. Other than that
restriction, _S_ is unspecified and will be chosen arbitrarily during execution.
Once a particular _S_ has been established, every atomics modification order is
then guaranteed to be consistent with it — this means that a `SeqCst` load will
never see a value that has been overwritten by a write that occurred before it
in _S_, or a value that has been written by a write that occured after it in
_S_ (note that a `Relaxed`/`Acquire` load however might, since there is no
“before” or “after” as it is not in _S_ in the first place).
So, looking back at our program, lets consider how we could use `SeqCst` to
make that execution invalid. As a refresher, heres the framework for every
possible execution of the program:
```text
a static X c d static Y b
╭─────────╮ ┌───────┐ ╭─────────╮ ╭─────────╮ ┌───────┐ ╭─────────╮
│ store X ├─┐ │ false │ ┌─┤ load X │ │ load Y ├─┐ │ false │ ┌─┤ store Y │
╰─────────╯ │ └───────┘ │ ╰────╥────╯ ╰────╥────╯ │ └───────┘ │ ╰─────────╯
└─┬───────┐ │ ╭────⇓────╮ ╭────⇓────╮ │ ┌───────┬─┘
│ true ├─┘ │ load Y ├─? ?─┤ load X │ └─┤ true │
└───────┘ ╰─────────╯ ╰─────────╯ └───────┘
```
First of all, both the final loads (`c` and `d`s second operations) need to
become `SeqCst`, because they need to be aware of the total ordering that
determines whether `X` or `Y` becomes `true` first. And secondly, we need to
establish that ordering in the first place, and that needs to be done by making
sure that there is always one operation in _S_ that both sees one of the atomics
as `true` and precedes both final loads (the final loads themselves dont work
for this since although they “know” that their corresponding atomic is `true`
they dont interact with it directly so _S_ doesnt care).
There are two operations in the program that could fulfill the first condition,
should they be made `SeqCst`: the stores of `true` and the first loads. However,
the second condition ends up ruling out using the stores, since in order to make
sure that they precede the final loads in _S_ it would be necessary to have the
first loads be `SeqCst` anyway (due to the mixed-`SeqCst` special case detailed
later), so in the end we can just leave them as `Relaxed`.
This leaves us with the correct version of the above program, which is
guaranteed to never panic:
```rust
# use std::sync::atomic::{AtomicBool, Ordering::{Relaxed, SeqCst}};
use std::thread;
static X: AtomicBool = AtomicBool::new(false);
static Y: AtomicBool = AtomicBool::new(false);
let a = thread::spawn(|| { X.store(true, Relaxed) });
let b = thread::spawn(|| { Y.store(true, Relaxed) });
let c = thread::spawn(|| { while !X.load(SeqCst) {} Y.load(SeqCst) });
let d = thread::spawn(|| { while !Y.load(SeqCst) {} X.load(SeqCst) });
let a = a.join().unwrap();
let b = b.join().unwrap();
let c = c.join().unwrap();
let d = d.join().unwrap();
// This assert is **not** allowed to fail.
assert!(c || d);
```
As there are four `SeqCst` operations with a partial order between two pairs in
them (caused by the sequenced before relation), there are six possible
executions of this program:
- All of `c`s loads precede `d`s loads:
1. `c` loads `X` (gives `true`)
1. `c` loads `Y` (gives either `false` or `true`)
1. `d` loads `Y` (gives `true`)
1. `d` loads `X` (required to be `true`)
- Both initial loads precede both final loads:
1. `c` loads `X` (gives `true`)
1. `d` loads `Y` (gives `true`)
1. `c` loads `Y` (required to be `true`)
1. `d` loads `X` (required to be `true`)
- As above, but the final loads occur in a different order:
1. `c` loads `X` (gives `true`)
1. `d` loads `Y` (gives `true`)
1. `d` loads `X` (required to be `true`)
1. `c` loads `Y` (required to be `true`)
- As before, but the initial loads occur in a different order:
1. `d` loads `Y` (gives `true`)
1. `c` loads `X` (gives `true`)
1. `c` loads `Y` (required to be `true`)
1. `d` loads `X` (required to be `true`)
- As above, but the final loads occur in a different order:
1. `d` loads `Y` (gives `true`)
1. `c` loads `X` (gives `true`)
1. `d` loads `X` (required to be `true`)
1. `c` loads `Y` (required to be `true`)
- All of `d`s loads precede `c`s loads:
1. `d` loads `Y` (gives `true`)
1. `d` loads `X` (gives either `false` or `true`)
1. `c` loads `X` (gives `true`)
1. `c` loads `Y` (required to be `true`)
All the places were the load is requied to give `true` were caused by a
preceding load in _S_ of the same atomic which saw `true`, because otherwise _S_
would be inconsistent with the atomics modification order and that is
impossible.
## The mixed-`SeqCst` special case
As Ive been alluding to for a while, I wasnt being totally truthful when I
said that _S_ is consistent with happens before relations — in reality, it is
only consistent with _strongly happens before_ relations, which presents a
subtly-defined subset of happens before relations. In particular, it excludes
two situations:
1. The `SeqCst` operation A synchronizes-with an `Acquire` or `AcqRel` operation
B which is sequenced before another `SeqCst` operation C. Here, despite the
fact that A happens before C, A does not _strongly_ happen before C and so is
there not guaranteed to precede C in _S_.
2. The `SeqCst` operation A is sequenced-before the `Release` or `AcqRel`
operation B, which synchronizes-with another `SeqCst` operation C. Similarly,
despite the fact that A happens before C, A might not precede C in _S_.
The first situation is illustrated below, with `SeqCst` accesses repesented with
asterisks:
```text
t_1 x t_2
╭─────╮ ┌─↘───┐ ╭─────╮
*A* ├─┘ │ 1 ├───→ B │
╰─────╯ └───┘ ╰──╥──╯
╭──⇓──╮
*C*
╰─────╯
```
A happens before, but does not strongly happen before, C — and anything
sequenced after C will have the same treatment (unless more synchronization is
used). This means that C is actually allowed to _precede_ A in _S_, despite
conceptually happening after it. However, anything sequenced before A, because
there is at least one sequence on either side of the synchronization, will
strongly happen before C.
But this is all highly theoretical at the moment, so lets make an example to
show how that rule can actually affect the execution of code. So, if C were to
precede A in _S_ then that means in the modification order of any atomic they
both access, C would have to come before A. Lets say then that C loads from `x`
(the atomic that A has to access), it may load the value that came before A if
it were to precede A in _S_:
```text
t_1 x t_2
╭─────╮ ┌───┐ ╭─────╮
*A* ├─┐ │ 0 ├─┐┌→ B │
╰─────╯ │ └───┘ ││╰──╥──╯
└─↘───┐┌─┘╭──⇓──╮
│ 1 ├┘└─→ *C*
└───┘ ╰─────╯
```
Ah wait no, that doesnt work because coherence still mandates that `1` is the
only value that can be loaded. In fact, once `1` is loaded _S_s required
consistency with modification orders means that A _is_ required to precede C in
_S_ after all.
So somehow, to observe this difference we need to have a _different_ `SeqCst`
operation, lets call it E, be the one that loads from `x`, where C is
guaranteed to precede it in _S_ (so we can observe the “weird” state in between
C and A) but C also doesnt happen before it (to avoid coherence getting in the
way) — and to do that, all we have to do is have C appear before a `SeqCst`
operation D in the modification order of another atomic, but have D be a store
so as to avoid C synchronizing with it, and then our desired load E can simply
be sequenced after D (this will carry over the “precedes in _S_” guarantee, but
does not restore the happens after relation to C since that was already dropped
by having D be a store).
In diagram form, that looks like this:
```text
t_1 x t_2 helper t_3
╭─────╮ ┌───┐ ╭─────╮ ┌─────┐ ╭─────╮
*A* ├─┐ │ 0 ├┐┌─→ B │ ┌─┤ 0 │ ┌─┤ *D*
╰─────╯ │ └───┘││ ╰──╥──╯ │ └─────┘ │ ╰──╥──╯
│ └│────║────│─────────│┐ ║
└─↘───┐ │ ╭──⇓──╮ │ ┌─────↙─┘│╭──⇓──╮
│ 1 ├─┘ │ *C* ←─┘ │ 1 │ └→ *E*
└───┘ ╰─────╯ └─────┘ ╰─────╯
S = C → D → E → A
```
C is guaranteed to precede D in _S_, and D is guaranteed to precede E, but
because this exception means that A is _not_ guaranteed to precede C, it is
totally possible for it to come at the end, resulting in the surprising but
totally valid outcome of E loading `0` from `x`. In code, this can be expressed
as the following code _not_ being guaranteed to panic:
```rust
# use std::sync::atomic::{AtomicU8, Ordering::{Acquire, SeqCst}};
# return;
static X: AtomicU8 = AtomicU8::new(0);
static HELPER: AtomicU8 = AtomicU8::new(0);
// thread_1
X.store(1, SeqCst); // A
// thread_2
assert_eq!(X.load(Acquire), 1); // B
assert_eq!(HELPER.load(SeqCst), 0); // C
// thread_3
HELPER.store(1, SeqCst); // D
assert_eq!(X.load(SeqCst), 0); // E
```
The second situation listed above has very similar consequences. Its abstract
form is the following execution in which A is not guaranteed to precede C in
_S_, despite A happening before C:
```text
t_1 x t_2
╭─────╮ ┌─↘───┐ ╭─────╮
*A* │ │ │ 0 ├───→ *C*
╰──╥──╯ │ └───┘ ╰─────╯
╭──⇓──╮ │
│ B ├─┘
╰─────╯
```
Similarly to before, we cant just have A access `x` to show why A not
necessarily preceding C in _S_ matters; instead, we have to introduce a second
atomic and third thread to break the happens before chain first. And finally, a
single relaxed load F at the end is added just to prove that the weird execution
actually happened (leaving `x` as 2 instead of 1).
```text
t_3 helper t_1 x t_2
╭─────╮ ┌─────┐ ╭─────╮ ┌───┐ ╭─────╮
*D* ├┐┌─┤ 0 │ ┌─┤ *A* │ │ 0 │ ┌─→ *C*
╰──╥──╯││ └─────┘ │ ╰──╥──╯ └───┘ │ ╰──╥──╯
║ └│─────────│────║─────┐ │ ║
╭──⇓──╮ │ ┌─────↙─┘ ╭──⇓──╮ ┌─↘───┐ │ ╭──⇓──╮
*E* ←─┘ │ 1 │ │ B ├─┘││ 1 ├─┘┌┤ F │
╰─────╯ └─────┘ ╰─────╯ │└───┘ │╰─────╯
└↘───┐ │
│ 2 ├──┘
└───┘
S = C → D → E → A
```
This execution mandates both C preceding A in _S_ and A happening-before C,
something that is only possible through these two mixed-`SeqCst` special
exceptions. It can be expressed in code as well:
```rust
# use std::sync::atomic::{AtomicU8, Ordering::{Release, Relaxed, SeqCst}};
# return;
static X: AtomicU8 = AtomicU8::new(0);
static HELPER: AtomicU8 = AtomicU8::new(0);
// thread_3
X.store(2, SeqCst); // D
assert_eq!(HELPER.load(SeqCst), 0); // E
// thread_1
HELPER.store(1, SeqCst); // A
X.store(1, Release); // B
// thread_2
assert_eq!(X.load(SeqCst), 1); // C
assert_eq!(X.load(Relaxed), 2); // F
```
If this seems ridiculously specific and obscure, thats because it is.
Originally, back in C++11, this special case didnt exist — but then six years
later it was discovered that in practice atomics on Power, Nvidia GPUs and
sometimes ARMv7 _would_ have this special case, and fixing the implementations
would make atomics significantly slower. So instead, in C++20 they simply
encoded it into the specification.
Generally however, this rule is so complex its best to just avoid it entirely
by never mixing `SeqCst` and non-`SeqCst` on a single atomic in the first place
— or even better, just avoiding `SeqCst` entirely and using a stronger ordering
instead that has less complex semantics and fewer gotchas.

Loading…
Cancel
Save