From afe0ee2bf015ef6ad8a74b6d56ea90d23ea86aa4 Mon Sep 17 00:00:00 2001 From: SabrinaJewson Date: Sun, 28 Aug 2022 19:58:46 +0100 Subject: [PATCH] Write the `SeqCst` section --- src/atomics/seqcst.md | 389 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 389 insertions(+) diff --git a/src/atomics/seqcst.md b/src/atomics/seqcst.md index a9f12e5..6be9593 100644 --- a/src/atomics/seqcst.md +++ b/src/atomics/seqcst.md @@ -1 +1,390 @@ # SeqCst + +`SeqCst` is probably the most interesting ordering, because it is simultaneously +the simplest and most complex atomic memory ordering in existence. It’s +simple, because if you do only use `SeqCst` everywhere then you can kind of +maybe pretend like the Abstract Machine has a concept of time; phrases like +“latest value” make sense, the program can be thought of as a set of steps that +interleave, there is a universal “now” and “before” and wouldn’t that be nice? +But it’s also the most complex, because as soon as look under the hood you +realize just how incredibly convoluted and hard to follow the actual rules +behind it are, and it gets really ugly really fast as soon as you try to mix it +with any other ordering. + +To understand `SeqCst`, we first have to understand the problem it exists to +solve. The first complexity is that this problem can only be observed in the +presence of at least four different threads _and_ two separate atomic variables; +anything less and it’s not possible to notice a difference. The common example +used to show where weaker orderings produce counterintuitive results is this: + +```rust +# use std::sync::atomic::{self, AtomicBool}; +use std::thread; + +// Set this to Relaxed, Acquire, Release, AcqRel, doesn’t matter — the result is +// the same (modulo panics caused by attempting acquire stores or release +// loads). +const ORDERING: atomic::Ordering = atomic::Ordering::Relaxed; + +static X: AtomicBool = AtomicBool::new(false); +static Y: AtomicBool = AtomicBool::new(false); + +let a = thread::spawn(|| { X.store(true, ORDERING) }); +let b = thread::spawn(|| { Y.store(true, ORDERING) }); +let c = thread::spawn(|| { while !X.load(ORDERING) {} Y.load(ORDERING) }); +let d = thread::spawn(|| { while !Y.load(ORDERING) {} X.load(ORDERING) }); + +let a = a.join().unwrap(); +let b = b.join().unwrap(); +let c = c.join().unwrap(); +let d = d.join().unwrap(); + +# return; +// This assert is allowed to fail. +assert!(c || d); +``` + +The basic setup of this code, for all of its possible executions, looks like +this: + +```text + a static X c d static Y b +╭─────────╮ ┌───────┐ ╭─────────╮ ╭─────────╮ ┌───────┐ ╭─────────╮ +│ store X ├─┐ │ false │ ┌─┤ load X │ │ load Y ├─┐ │ false │ ┌─┤ store Y │ +╰─────────╯ │ └───────┘ │ ╰────╥────╯ ╰────╥────╯ │ └───────┘ │ ╰─────────╯ + └─┬───────┐ │ ╭────⇓────╮ ╭────⇓────╮ │ ┌───────┬─┘ + │ true ├─┘ │ load Y ├─? ?─┤ load X │ └─┤ true │ + └───────┘ ╰─────────╯ ╰─────────╯ └───────┘ +``` + +In other words, `a` and `b` are guaranteed to, at some point, store `true` into +`X` and `Y` respectively, and `c` and `d` are guaranteed to, at some point, load +those values of `true` from `X` and `Y` (there could also be an arbitrary number +of loads of `false` by `c` and `d`, but they’ve been omitted since they don’t +actually affect the execution at all). The question now is when `c` and `d` load +from `Y` and `X` respectively, is it possible for them _both_ to load `false`? + +And looking at this diagram, there’s absolutely no reason why not. There isn’t +even a single arrow connecting the left and right hand sides so far, so the load +has no restrictions on which value it is allowed to pick — and this goes for +both sides equally, so we could end up with an execution like this: + +```text + a static X c d static Y b +╭─────────╮ ┌───────┐ ╭─────────╮ ╭─────────╮ ┌───────┐ ╭─────────╮ +│ store X ├─┐ │ false ├┐ ┌┤ load X │ │ load Y ├┐ ┌┤ false │ ┌─┤ store Y │ +╰─────────╯ │ └───────┘│ │╰────╥────╯ ╰────╥────╯│ │└───────┘ │ ╰─────────╯ + └─┬───────┐└─│─────║──────┐┌─────║─────│─┘┌───────┬─┘ + │ true ├──┘╭────⇓────╮┌─┘╭────⇓────╮└──┤ true │ + └───────┘ │ load Y ├┘└─┤ load X │ └───────┘ + ╰─────────╯ ╰─────────╯ +``` + +Which results in a failed assert. This execution is brought about because the +model of separate modification orders means that there is no relative ordering +between `X` and `Y` being changed, and so each thread is allowed to “see” either +order. However, some algorithms will require a globally agreed-upon ordering, +and this is where `SeqCst` can come in useful. + +This ordering, first and foremost, inherits the guarantees from all the other +orderings — it is an acquire operation for loads, a release operation for stores +and an acquire-release operation for RMWs. In addition to this, it gives some +guarantees unique to `SeqCst` about what values it is allowed to load. Note that +these guarantees are not about preventing data races: unless you have some +unrelated code that triggers a data race given an unexpected condition, using +`SeqCst` can only prevent you from race conditions because its guarantees only +apply to other `SeqCst` operations rather than all data accesses. + +## S + +`SeqCst` is fundamentally about _S_, which is the global ordering of all +`SeqCst` operations in an execution of the program. It is consistent between +every atomic and every thread, and all stores, fences and RMWs that use a +sequentially consistent ordering have a place in it (but no other operations +do). It is in contrast to modification orders, which are similarly total but +only scoped to a single atomic rather than the whole program. + +Other than an edge case involving `SeqCst` mixed with weaker orderings (detailed +in the next section), _S_ is primarily controlled by the happens before +relations in a program: this means that if an action _A_ happens before an +action _B_, it is also guaranteed to appear before _B_ in _S_. Other than that +restriction, _S_ is unspecified and will be chosen arbitrarily during execution. + +Once a particular _S_ has been established, every atomic’s modification order is +then guaranteed to be consistent with it — this means that a `SeqCst` load will +never see a value that has been overwritten by a write that occurred before it +in _S_, or a value that has been written by a write that occured after it in +_S_ (note that a `Relaxed`/`Acquire` load however might, since there is no +“before” or “after” as it is not in _S_ in the first place). + +So, looking back at our program, let’s consider how we could use `SeqCst` to +make that execution invalid. As a refresher, here’s the framework for every +possible execution of the program: + +```text + a static X c d static Y b +╭─────────╮ ┌───────┐ ╭─────────╮ ╭─────────╮ ┌───────┐ ╭─────────╮ +│ store X ├─┐ │ false │ ┌─┤ load X │ │ load Y ├─┐ │ false │ ┌─┤ store Y │ +╰─────────╯ │ └───────┘ │ ╰────╥────╯ ╰────╥────╯ │ └───────┘ │ ╰─────────╯ + └─┬───────┐ │ ╭────⇓────╮ ╭────⇓────╮ │ ┌───────┬─┘ + │ true ├─┘ │ load Y ├─? ?─┤ load X │ └─┤ true │ + └───────┘ ╰─────────╯ ╰─────────╯ └───────┘ +``` + +First of all, both the final loads (`c` and `d`’s second operations) need to +become `SeqCst`, because they need to be aware of the total ordering that +determines whether `X` or `Y` becomes `true` first. And secondly, we need to +establish that ordering in the first place, and that needs to be done by making +sure that there is always one operation in _S_ that both sees one of the atomics +as `true` and precedes both final loads (the final loads themselves don’t work +for this since although they “know” that their corresponding atomic is `true` +they don’t interact with it directly so _S_ doesn’t care). + +There are two operations in the program that could fulfill the first condition, +should they be made `SeqCst`: the stores of `true` and the first loads. However, +the second condition ends up ruling out using the stores, since in order to make +sure that they precede the final loads in _S_ it would be necessary to have the +first loads be `SeqCst` anyway (due to the mixed-`SeqCst` special case detailed +later), so in the end we can just leave them as `Relaxed`. + +This leaves us with the correct version of the above program, which is +guaranteed to never panic: + +```rust +# use std::sync::atomic::{AtomicBool, Ordering::{Relaxed, SeqCst}}; +use std::thread; + +static X: AtomicBool = AtomicBool::new(false); +static Y: AtomicBool = AtomicBool::new(false); + +let a = thread::spawn(|| { X.store(true, Relaxed) }); +let b = thread::spawn(|| { Y.store(true, Relaxed) }); +let c = thread::spawn(|| { while !X.load(SeqCst) {} Y.load(SeqCst) }); +let d = thread::spawn(|| { while !Y.load(SeqCst) {} X.load(SeqCst) }); + +let a = a.join().unwrap(); +let b = b.join().unwrap(); +let c = c.join().unwrap(); +let d = d.join().unwrap(); + +// This assert is **not** allowed to fail. +assert!(c || d); +``` + +As there are four `SeqCst` operations with a partial order between two pairs in +them (caused by the sequenced before relation), there are six possible +executions of this program: + +- All of `c`’s loads precede `d`’s loads: + 1. `c` loads `X` (gives `true`) + 1. `c` loads `Y` (gives either `false` or `true`) + 1. `d` loads `Y` (gives `true`) + 1. `d` loads `X` (required to be `true`) +- Both initial loads precede both final loads: + 1. `c` loads `X` (gives `true`) + 1. `d` loads `Y` (gives `true`) + 1. `c` loads `Y` (required to be `true`) + 1. `d` loads `X` (required to be `true`) +- As above, but the final loads occur in a different order: + 1. `c` loads `X` (gives `true`) + 1. `d` loads `Y` (gives `true`) + 1. `d` loads `X` (required to be `true`) + 1. `c` loads `Y` (required to be `true`) +- As before, but the initial loads occur in a different order: + 1. `d` loads `Y` (gives `true`) + 1. `c` loads `X` (gives `true`) + 1. `c` loads `Y` (required to be `true`) + 1. `d` loads `X` (required to be `true`) +- As above, but the final loads occur in a different order: + 1. `d` loads `Y` (gives `true`) + 1. `c` loads `X` (gives `true`) + 1. `d` loads `X` (required to be `true`) + 1. `c` loads `Y` (required to be `true`) +- All of `d`’s loads precede `c`’s loads: + 1. `d` loads `Y` (gives `true`) + 1. `d` loads `X` (gives either `false` or `true`) + 1. `c` loads `X` (gives `true`) + 1. `c` loads `Y` (required to be `true`) + +All the places were the load is requied to give `true` were caused by a +preceding load in _S_ of the same atomic which saw `true`, because otherwise _S_ +would be inconsistent with the atomic’s modification order and that is +impossible. + +## The mixed-`SeqCst` special case + +As I’ve been alluding to for a while, I wasn’t being totally truthful when I +said that _S_ is consistent with happens before relations — in reality, it is +only consistent with _strongly happens before_ relations, which presents a +subtly-defined subset of happens before relations. In particular, it excludes +two situations: + +1. The `SeqCst` operation A synchronizes-with an `Acquire` or `AcqRel` operation + B which is sequenced before another `SeqCst` operation C. Here, despite the + fact that A happens before C, A does not _strongly_ happen before C and so is + there not guaranteed to precede C in _S_. +2. The `SeqCst` operation A is sequenced-before the `Release` or `AcqRel` + operation B, which synchronizes-with another `SeqCst` operation C. Similarly, + despite the fact that A happens before C, A might not precede C in _S_. + +The first situation is illustrated below, with `SeqCst` accesses repesented with +asterisks: + +```text + t_1 x t_2 +╭─────╮ ┌─↘───┐ ╭─────╮ +│ *A* ├─┘ │ 1 ├───→ B │ +╰─────╯ └───┘ ╰──╥──╯ + ╭──⇓──╮ + │ *C* │ + ╰─────╯ +``` + +A happens before, but does not strongly happen before, C — and anything +sequenced after C will have the same treatment (unless more synchronization is +used). This means that C is actually allowed to _precede_ A in _S_, despite +conceptually happening after it. However, anything sequenced before A, because +there is at least one sequence on either side of the synchronization, will +strongly happen before C. + +But this is all highly theoretical at the moment, so let’s make an example to +show how that rule can actually affect the execution of code. So, if C were to +precede A in _S_ then that means in the modification order of any atomic they +both access, C would have to come before A. Let’s say then that C loads from `x` +(the atomic that A has to access), it may load the value that came before A if +it were to precede A in _S_: + +```text + t_1 x t_2 +╭─────╮ ┌───┐ ╭─────╮ +│ *A* ├─┐ │ 0 ├─┐┌→ B │ +╰─────╯ │ └───┘ ││╰──╥──╯ + └─↘───┐┌─┘╭──⇓──╮ + │ 1 ├┘└─→ *C* │ + └───┘ ╰─────╯ +``` + +Ah wait no, that doesn’t work because coherence still mandates that `1` is the +only value that can be loaded. In fact, once `1` is loaded _S_’s required +consistency with modification orders means that A _is_ required to precede C in +_S_ after all. + +So somehow, to observe this difference we need to have a _different_ `SeqCst` +operation, let’s call it E, be the one that loads from `x`, where C is +guaranteed to precede it in _S_ (so we can observe the “weird” state in between +C and A) but C also doesn’t happen before it (to avoid coherence getting in the +way) — and to do that, all we have to do is have C appear before a `SeqCst` +operation D in the modification order of another atomic, but have D be a store +so as to avoid C synchronizing with it, and then our desired load E can simply +be sequenced after D (this will carry over the “precedes in _S_” guarantee, but +does not restore the happens after relation to C since that was already dropped +by having D be a store). + +In diagram form, that looks like this: + +```text + t_1 x t_2 helper t_3 +╭─────╮ ┌───┐ ╭─────╮ ┌─────┐ ╭─────╮ +│ *A* ├─┐ │ 0 ├┐┌─→ B │ ┌─┤ 0 │ ┌─┤ *D* │ +╰─────╯ │ └───┘││ ╰──╥──╯ │ └─────┘ │ ╰──╥──╯ + │ └│────║────│─────────│┐ ║ + └─↘───┐ │ ╭──⇓──╮ │ ┌─────↙─┘│╭──⇓──╮ + │ 1 ├─┘ │ *C* ←─┘ │ 1 │ └→ *E* │ + └───┘ ╰─────╯ └─────┘ ╰─────╯ + +S = C → D → E → A +``` + +C is guaranteed to precede D in _S_, and D is guaranteed to precede E, but +because this exception means that A is _not_ guaranteed to precede C, it is +totally possible for it to come at the end, resulting in the surprising but +totally valid outcome of E loading `0` from `x`. In code, this can be expressed +as the following code _not_ being guaranteed to panic: + +```rust +# use std::sync::atomic::{AtomicU8, Ordering::{Acquire, SeqCst}}; +# return; +static X: AtomicU8 = AtomicU8::new(0); +static HELPER: AtomicU8 = AtomicU8::new(0); + +// thread_1 +X.store(1, SeqCst); // A + +// thread_2 +assert_eq!(X.load(Acquire), 1); // B +assert_eq!(HELPER.load(SeqCst), 0); // C + +// thread_3 +HELPER.store(1, SeqCst); // D +assert_eq!(X.load(SeqCst), 0); // E +``` + +The second situation listed above has very similar consequences. Its abstract +form is the following execution in which A is not guaranteed to precede C in +_S_, despite A happening before C: + +```text + t_1 x t_2 +╭─────╮ ┌─↘───┐ ╭─────╮ +│ *A* │ │ │ 0 ├───→ *C* │ +╰──╥──╯ │ └───┘ ╰─────╯ +╭──⇓──╮ │ +│ B ├─┘ +╰─────╯ +``` + +Similarly to before, we can’t just have A access `x` to show why A not +necessarily preceding C in _S_ matters; instead, we have to introduce a second +atomic and third thread to break the happens before chain first. And finally, a +single relaxed load F at the end is added just to prove that the weird execution +actually happened (leaving `x` as 2 instead of 1). + +```text + t_3 helper t_1 x t_2 +╭─────╮ ┌─────┐ ╭─────╮ ┌───┐ ╭─────╮ +│ *D* ├┐┌─┤ 0 │ ┌─┤ *A* │ │ 0 │ ┌─→ *C* │ +╰──╥──╯││ └─────┘ │ ╰──╥──╯ └───┘ │ ╰──╥──╯ + ║ └│─────────│────║─────┐ │ ║ +╭──⇓──╮ │ ┌─────↙─┘ ╭──⇓──╮ ┌─↘───┐ │ ╭──⇓──╮ +│ *E* ←─┘ │ 1 │ │ B ├─┘││ 1 ├─┘┌┤ F │ +╰─────╯ └─────┘ ╰─────╯ │└───┘ │╰─────╯ + └↘───┐ │ + │ 2 ├──┘ + └───┘ +S = C → D → E → A +``` + +This execution mandates both C preceding A in _S_ and A happening-before C, +something that is only possible through these two mixed-`SeqCst` special +exceptions. It can be expressed in code as well: + +```rust +# use std::sync::atomic::{AtomicU8, Ordering::{Release, Relaxed, SeqCst}}; +# return; +static X: AtomicU8 = AtomicU8::new(0); +static HELPER: AtomicU8 = AtomicU8::new(0); + +// thread_3 +X.store(2, SeqCst); // D +assert_eq!(HELPER.load(SeqCst), 0); // E + +// thread_1 +HELPER.store(1, SeqCst); // A +X.store(1, Release); // B + +// thread_2 +assert_eq!(X.load(SeqCst), 1); // C +assert_eq!(X.load(Relaxed), 2); // F +``` + +If this seems ridiculously specific and obscure, that’s because it is. +Originally, back in C++11, this special case didn’t exist — but then six years +later it was discovered that in practice atomics on Power, Nvidia GPUs and +sometimes ARMv7 _would_ have this special case, and fixing the implementations +would make atomics significantly slower. So instead, in C++20 they simply +encoded it into the specification. + +Generally however, this rule is so complex it’s best to just avoid it entirely +by never mixing `SeqCst` and non-`SeqCst` on a single atomic in the first place +— or even better, just avoiding `SeqCst` entirely and using a stronger ordering +instead that has less complex semantics and fewer gotchas.