Write “Multithreaded Execution” and add simplified atomic spec

2 years ago · 0347b0183f
parent 8d1e4dccf7
commit 0347b0183f
9 changed files with 651 additions and 7 deletions
--- a/src/SUMMARY.md
+++ b/src/SUMMARY.md
@ -41,7 +41,14 @@
 * [Concurrency](concurrency.md)
  * [Races](races.md)
  * [Send and Sync](send-and-sync.md)
-  * [Atomics](atomics.md)
+  * [Atomics](./atomics/atomics.md)
+	* [Multithreaded Execution](./atomics/multithread.md)
+	* [Relaxed](./atomics/relaxed.md)
+	* [Acquire and Release](./atomics/acquire-release.md)
+	* [SeqCst](./atomics/seqcst.md)
+	* [Fences](./atomics/fences.md)
+	* [Signals](./atomics/signals.md)
+	* [Specification](./atomics/specification.md)
 * [Implementing Vec](./vec/vec.md)
  * [Layout](./vec/vec-layout.md)
  * [Allocating](./vec/vec-alloc.md)
--- a/src/atomics/acquire-release.md
+++ b/src/atomics/acquire-release.md
@ -0,0 +1 @@
+# Acquire and Release
--- a/src/atomics/atomics.md
+++ b/src/atomics/atomics.md
@ -17,12 +17,24 @@ details, you should check out the [C++ specification][C++-model].
 Still, we'll try to cover the basics and some of the problems Rust developers
 face.

-The C++ memory model is fundamentally about trying to bridge the gap between the
-semantics we want, the optimizations compilers want, and the inconsistent chaos
-our hardware wants. *We* would like to just write programs and have them do
-exactly what we said but, you know, fast. Wouldn't that be great?
+## Motivation

-## Compiler Reordering
+The C++ memory model is very large and confusing with lots of seemingly
+arbitrary design decisions. To understand the motivation behind this, it can
+help to look at what got us in this situation in the first place. There are
+three main factors at play here:
+
+1. Users of the language, who want fast, cross-platform code;
+2. compilers, who want to optimize code to make it fast;
+3. and the hardware, which is ready to unleash a wrath of inconsistent chaos on
+  your program at a moment's notice.
+
+The C++ memory model is fundamentally about trying to bridge the gap between
+these three, allowing users to write code for a logical and consistent abstract
+machine while the compiler and hardware deal with the madness underneath that
+makes it run fast.
+
+### Compiler Reordering

 Compilers fundamentally want to be able to do all sorts of complicated
 transformations to reduce data dependencies and eliminate dead code. In
@ -53,7 +65,7 @@ able to make these kinds of optimizations, because they can seriously improve
 performance. On the other hand, we'd also like to be able to depend on our
 program *doing the thing we said*.

-## Hardware Reordering
+### Hardware Reordering

 On the other hand, even if the compiler totally understood what we wanted and
 respected our wishes, our hardware might instead get us in trouble. Trouble
@ -106,6 +118,8 @@ programming:
  incorrect. If possible, concurrent algorithms should be tested on
  weakly-ordered hardware.

+---
+
 ## Data Accesses

 The C++ memory model attempts to bridge the gap by allowing us to talk about the
--- a/src/atomics/fences.md
+++ b/src/atomics/fences.md
@ -0,0 +1 @@
+# Fences
--- a/src/atomics/multithread.md
+++ b/src/atomics/multithread.md
@ -0,0 +1,220 @@
+# Multithreaded Execution
+
+The first important thing to understand about C++20 atomics is that **the
+abstract machine has no concept of time**. You might expect there to be a single
+global ordering of events across the program where each happens at the same time
+or one after the other, but under the abstract model no such ordering exists;
+instead, a possible execution of the program must be treated as a single event
+that happens instantaneously — there is never any such thing as “now”, or a
+“latest value”, and using that terminology will only lead you to more confusion.
+(Of course, in reality there does exist a concept of time, but you must keep in
+mind that you’re not programming for the hardware, you’re programming for the
+AM.)
+
+However, while no global ordering of operations exists _between_ threads, there
+does exist a single total ordering _within_ each thread, which is known as its
+_sequence_. For example, given this simple Rust program:
+
+```rs
+println!("A");
+println!("B");
+```
+
+its sequence during one possible execution can be visualized like so:
+
+```text
+╭───────────────╮
+│ println!("A") │
+╰───────╥───────╯
+╭───────⇓───────╮
+│ println!("B") │
+╰───────────────╯
+```
+
+That double arrow in between the two boxes (`⇒`) represents that the second
+statement is _sequenced after_ the first (and similarly the first statement is
+_sequenced before_ the second). This is the strongest kind of ordering guarantee
+between any two operations, and only comes about when those two operations
+happen one after the other and on the same thread.
+
+If we add a second thread to the mix:
+
+```rs
+// Thread 1:
+println!("A");
+println!("B");
+// Thread 2:
+eprintln!("01");
+eprintln!("02");
+```
+
+it will simply coexist in parallel, with each thread getting its own independent
+sequence:
+
+```text
+    Thread 1              Thread 2
+╭───────────────╮    ╭─────────────────╮
+│ println!("A") │    │ eprintln!("01") │
+╰───────╥───────╯    ╰────────╥────────╯
+╭───────⇓───────╮    ╭────────⇓────────╮
+│ println!("B") │    │ eprintln!("02") │
+╰───────────────╯    ╰─────────────────╯
+```
+
+Note that this is **not** a representation of multiple things that _could_
+happen at runtime — instead, this diagram describes exactly what _did_ happen
+when the program ran once. This distinction is key, because it highlights that
+even the lowest-level representation of a program’s execution does not have
+a global ordering between threads; those two disconnected chains are all there
+is.
+
+Now let’s make things more interesting by introducing some shared data, and have
+both threads read it.
+
+```rs
+// Initial state
+let data = 0;
+// Thread 1:
+data;
+// Thread 2:
+data;
+```
+
+Each memory location, similarly to threads, can be shown as another column on
+our diagram, but holding values instead of instructions, and each access (read
+or write) manifests as a line from the instruction that performed the access to
+the associated value in the column. So this code can produce (and is in fact
+guaranteed to produce) the following execution:
+
+```text
+Thread 1     data     Thread 2
+╭──────╮    ┌────┐    ╭──────╮
+│ data ├╌╌╌╌┤  0 ├╌╌╌╌┤ data │
+╰──────╯    └────┘    ╰──────╯
+```
+
+That is, both threads read the same value of `0` from `data`, with no relative
+ordering between them. This is the simple case, for when the data doesn’t ever
+change — but that’s no fun, so let’s add some mutability in the mix (we’ll also
+return to a single thread, just to keep things simple).
+
+Consider this code, which we’re going to attempt to draw a diagram for like
+above:
+
+```rs
+let mut data = 0;
+data = 1;
+data;
+data = 2;
+```
+
+Working out executions of code like this is rather like solving a Sudoku puzzle:
+you must first lay out all the facts that you know, and then fill in the blanks
+with logical reasoning. The initial information we’ve been given is both the
+initial value of `data` and the sequential order of Thread 1; we also know that
+over its lifetime, `data` takes on a total of three different values that were
+caused by two different non-atomic writes. This allows us to start drawing out
+some boxes:
+
+```text
+ Thread 1        data
+╭───────╮       ┌────┐
+│  = 1  ├╌?     │  0 │
+╰───╥───╯     ?╌┼╌╌╌╌┤
+╭───⇓───╮     ?╌┼╌╌╌╌┤
+│  data ├╌?     │  ? │
+╰───╥───╯     ?╌┼╌╌╌╌┤
+╭───⇓───╮     ?╌┼╌╌╌╌┤
+│  = 2  ├╌?     │  ? │
+╰───────╯       └────┘
+```
+
+Note the use of dashed padding in between the values of `data`’s column. Those
+spaces won’t ever contain a value, but they’re used to represent an
+unsynchronized (non-atomic) write — it is garbage data and attempting to read it
+would result in a data race.
+
+To solve this puzzle, we first need to bring in a new rule that governs all
+memory accesses to a particular location:
+> From the point at which the access occurs, find every other point that can be
+> reached by following the reverse direction of arrows, then for each one of
+> those, take a single step across every line that connects to the relevant
+> memory location. **It is not allowed for the access to read or write any value
+> that appears above any one of these points**.
+
+In our case, there are two potential executions: one, where the first write
+corresponds to the first value in `data`, and two, where the first write
+corresponds to the second value in `data`. Considering the second case for a
+moment, it would also force the second write to correspond to the first
+value in `data`. Therefore its diagram would look something like this:
+
+```text
+ Thread 1        data
+╭───────╮       ┌────┐
+│  = 1  ├╌╌┐    │  0 │
+╰───╥───╯  ┊ ┌╌╌┼╌╌╌╌┤
+╭───⇓───╮  ┊ ├╌╌┼╌╌╌╌┤
+│  data ├╌?┊ ┊  │  2 │
+╰───╥───╯  ├╌┼╌╌┼╌╌╌╌┤
+╭───⇓───╮  └╌┼╌╌┼╌╌╌╌┤
+│  = 2  ├╌╌╌╌┘  │  1 │
+╰───────╯       └────┘
+```
+
+However, that second line breaks the rule we just established! Following up the
+arrows from the third operation in Thread 1, we reach the first operation, and
+from there we can take a single step to reach the space in between the `2` and
+the `1`, which excludes the this access from writing any value above that point.
+
+So evidently, this execution is no good. We can therefore conclude that the only
+possible execution of this program is the other one, in which the `1` appears
+above the `2`:
+
+```text
+ Thread 1     data
+╭───────╮     ┌────┐
+│  = 1  ├╌╌┐  │  0 │
+╰───╥───╯  ├╌╌┼╌╌╌╌┤
+╭───⇓───╮  └╌╌┼╌╌╌╌┤
+│  data ├╌?   │  1 │
+╰───╥───╯  ┌╌╌┼╌╌╌╌┤
+╭───⇓───╮  ├╌╌┼╌╌╌╌┤
+│  = 2  ├╌╌┘  │  2 │
+╰───────╯     └────┘
+```
+
+Now to sort out the read operation in the middle. We can use the same rule as
+before to trace up to the first write and rule out us reading either the `0`
+value or the garbage that exists between it and `1`, but how to we choose
+between the `1` and the `2`? Well, as it turns out there is a complement to the
+rule we already defined which gives us the exact answer we need:
+
+> From the point at which the access occurs, find every other point that can be
+> reached by following the _forward_ direction of arrows, then for each one of
+> those, take a single step across every line that connects to the relevant
+> memory location. **It is not allowed for the access to read or write any value
+> that appears below any one of these points**.
+
+Using this rule, we can follow the arrow downwards and then across and finally
+rule out `2` as well as the garbage before it. This leaves us with exactly _one_
+value that the read operation can return, and exactly one possible execution
+guaranteed by the Abstract Machine:
+
+```text
+ Thread 1      data
+╭───────╮     ┌────┐
+│  = 1  ├╌╌┐  │  0 │
+╰───╥───╯  ├╌╌┼╌╌╌╌┤
+╭───⇓───╮  └╌╌┼╌╌╌╌┤
+│  data ├╌╌╌╌╌┤  1 │
+╰───╥───╯  ┌╌╌┼╌╌╌╌┤
+╭───⇓───╮  ├╌╌┼╌╌╌╌┤
+│  = 2  ├╌╌┘  │  2 │
+╰───────╯     └────┘
+```
+
+You might be thinking that all this has been is the longest, most convoluted
+explanation ever of the most basic intuitive semantics of programming — and
+you’d be absolutely right. But it’s essential to grasp these fundamentals,
+because once you have this model in mind, the extension into multiple threads
+and the complicated semantics of real atomics becomes completely natural.
--- a/src/atomics/relaxed.md
+++ b/src/atomics/relaxed.md
@ -0,0 +1,43 @@
+# Relaxed
+
+Now we’ve got single-threaded mutation semantics out of the way, we can try
+reintroducing a second thread. We’ll have one thread perform a write to the
+memory location, and a second thread read from it, like so:
+
+```rs
+// Initial state
+let mut state = 0;
+// Thread 1:
+data = 1;
+// Thread 2:
+data;
+```
+
+Of course, any Rust programmer will immediately tell you that this code doesn’t
+compile, and indeed it definitely does not, and for good reason. But suspend
+your disbelief for a moment, and imagine what would happen if it did. Let’s draw
+a diagram, leaving out the reading lines for now:
+
+```text
+Thread 1     data    Thread 2
+╭───────╮   ┌────┐   ╭───────╮
+│  = 1  ├╌┐ │  0 │ ?╌┤  data │
+╰───────╯ ├╌┼╌╌╌╌┤   ╰───────╯
+          └╌┼╌╌╌╌┤
+            │  1 │
+            └────┘
+```
+
+Let’s try to figure out where the line in Thread 2’s access joins up. The rules
+from before don’t help us much unfortunately since there are no arrows
+connecting that operation to anything, so we can’t immediately rule anything
+out. As a result, we end up facing a situation we haven’t faced before: there is
+_more than one_ potential value for Thread 2 to read.
+
+And this is where we encounter the big limitation with unsynchronized data
+accesses: the price we pay for their speed and optimization capability is that
+this situation is considered **Undefined Behavior**. For an unsynchronized read
+to be acceptable, there has to be _exactly one_ potential value for it to read,
+and when there are multiple like in this situation it is considered a data race.
+
+## “Out-of-thin-air” values
--- a/src/atomics/seqcst.md
+++ b/src/atomics/seqcst.md
@ -0,0 +1 @@
+# SeqCst
--- a/src/atomics/signals.md
+++ b/src/atomics/signals.md
@ -0,0 +1,3 @@
+# Signals
+
+(and compiler fences)
--- a/src/atomics/specification.md
+++ b/src/atomics/specification.md
@ -0,0 +1,354 @@
+# Specification
+
+Below is a modified C++20 specification draft (as it was on 2022-07-16), edited
+to remove C++-only features like consume orderings and `sig_atomic_t`.
+
+Note that although this has been checked, atomics are very difficult to get
+right and so there may be subtle mistakes. If you want to more formally check
+your software, read the [\[intro.races\]], [\[atomics.order\]] and
+[\[atomics.fences\]] sections of the real C++ specification.
+
+[\[intro.races\]]: https://eel.is/c++draft/intro.races
+[\[atomics.order\]]: https://eel.is/c++draft/atomics.order
+[\[atomics.fences\]]: https://eel.is/c++draft/atomics.fences
+
+## Data races
+
+The value of an object visible to a thread _T_ at a particular point is the
+initial value of the object, a value assigned to the object by _T_, or a value
+assigned to the object by another thread, according to the rules below.
+
+> _Note 1_: In some cases, there might instead be undefined behavior. Much of
+> this subclause is motivated by the desire to support atomic operations with
+> explicit and detailed visibility constraints. However, it also implicitly
+> supports a simpler view for more restricted programs.
+
+Two expression evaluations _conflict_ if one of them modifies a memory location
+and the other one reads or modifies the same memory location.
+
+The library defines a number of atomic operations and operations on mutexes that
+are specially identified as synchronization operations. These operations play a
+special role in making assignments in one thread visible to another. A
+synchronization operation on one or more memory locations is either an acquire
+operation, a release operation, or both an acquire and release operation. A
+synchronization operation without an associated memory location is a fence and
+can be either an acquire fence, a release fence, or both an acquire and release
+fence. In addition, there are relaxed atomic operations, which are not
+synchronization operations, and atomic read-modify-write operations, which have
+special characteristics.
+
+> _Note 2_: For example, a call that acquires a mutex will perform an acquire
+> operation on the locations comprising the mutex. Correspondingly, a call that
+> releases the same mutex will perform a release operation on those same
+> locations. Informally, performing a release operation on _A_ forces prior side
+> effects on other memory locations to become visible to other threads that
+> later perform an acquire operation on _A_. “Relaxed” atomic operations are not
+> synchronization operations even though, like synchronization operations, they
+> cannot contribute to data races.
+
+All modifications to a particular atomic object _M_ occur in some particular
+total order, called the _modification order_ of _M_.
+
+> _Note 3_: There is a separate order for each atomic object. There is no
+> requirement that these can be combined into a single total order for all
+> objects. In general this will be impossible since different threads can
+> observe modifications to different objects in inconsistent orders.
+
+A _release sequence_ headed by a release operation _A_ on an atomic object _M_
+is a maximal contiguous sub-sequence of side effects in the modification order
+of _M_, where the first operation is _A_, and every subsequent operation is an
+atomic read-modify-write operation.
+
+Certain library calls _synchronize with_ other library calls performed by
+another thread. For example, an atomic store-release synchronizes with a
+load-acquire that takes its value from the store.
+
+> _Note 4_: Except in the specified cases, reading a later value does not
+> necessarily ensure visibility as described below. Such a requirement would
+> sometimes interfere with efficient implementation.
+
+> _Note 5_: The specifications of the synchronization operations define when one
+> reads the value written by another. For atomic objects, the definition is
+> clear. All operations on a given mutex occur in a single total order. Each
+> mutex acquisition “reads the value written” by the last mutex release.
+
+An evaluation _A_ _happens before_ an evaluation _B_ (or, equivalently, _B_
+_happens after_ _A_) if either:
+- _A_ is sequenced before _B_, or
+- _A_ synchronizes with _B_, or
+- for some evaluation _X_, _A_ happens before _X_ and _X_ happens before _B_.
+
+An evaluation _A_ _strongly happens before_ an evaluation _D_ if, either
+- _A_ is sequenced before _D_, or
+- _A_ synchronizes with _D_, and both _A_ and _D_ and sequentially consistent
+	atomic operations, or
+- there are evaluations _B_ and _C_ such that _A_ is sequenced before _B_, _B_
+	happens before _C_, and _C_ is sequenced before _D_, or
+- there is an evaluation _B_ such that _A_ strongly happens before _B_, and _B_
+	strongly happens before _D_.
+
+> _Note 11_: Informally, if _A_ strongly happens before _B_, then _A_ appears to
+> be evaluated before _B_ in all contexts.
+
+A _visible side effect_ _A_ on a scalar object _M_ with respect to a value
+computation _B_ of _M_ satisfies the conditions:
+- _A_ happens before _B_ and
+- there is no other side effect _X_ to _M_ such that _A_ happens before _X_ and
+	_X_ happens before _B_.
+
+The value of a non-atomic scalar object _M_, as determined by evaluation _B_,
+shall be the value stored by the visible side effect _A_.
+
+> _Note 12_: If there is ambiguity about which side effect to a non-atomic
+> object is visible, then the behavior is either unspecified or undefined.
+
+> _Note 13_: This states that operations on ordinary objects are not visibly
+> reordered. This is not actually detectable without data races, but it is
+> necessary to ensure that data races, as defined below, and with suitable
+> restrictions on the use of atomics, correspond to data races in a simple
+> interleaved (sequentially consistent) execution.
+
+The value of an atomic object _M_, as determined by evaluation _B_, shall be the
+value stored by some side effect _A_ that modifies _M_, where _B_ does not
+happen before _A_.
+
+> _Note 14_: The set of such side effects is also restricted by the rest of the
+> rules described here, and in particular, by the coherence requirements below.
+
+If an operation _A_ that modifies an atomic object _M_ happens before an
+operation _B_ that modifies _M_, then _A_ shall be earlier than _B_ in the
+modification order of _M_.
+
+> _Note 15_: This requirement is known as write-write coherence.
+
+If a value computation _A_ of an atomic object _M_ happens before a value
+computation _B_ of _M_, and _A_ takes its value from a side effect _X_ on _M_,
+then the value computed by _B_ shall either be the value stored by _X_ or the
+value stored by a side effect _Y_ on _M_, where _Y_ follows _X_ in the
+modification order of _M_.
+
+> _Note 16_: This requirement is known as read-read coherence.
+
+If a value computation _A_ of an atomic object _M_ happens before an operation
+_B_ that modifies _M_, then _A_ shall take its value from a side effect _X_ on
+_M_, where _X_ precedes _B_ in the modification order of _M_.
+
+> _Note 17_: This requirement is known as read-write coherence.
+
+If a side effect _X_ on an atomic object _M_ happens before a value computation
+_B_ of _M_, then the evaluation _B_ shall take its value from _X_ or from a side
+effect _Y_ that follows _X_ in the modification order of _M_.
+
+> _Note 18_: This requirement is known as write-read coherence.
+
+> _Note 19_: The four preceding coherence requirements effectively disallow
+> compiler reordering of atomic operations to a single object, even if both
+> operations are relaxed loads. This effectively makes the cache coherence
+> guarantee provided by most hardware available to C++ atomic operations.
+
+> _Note 20_: The value observed by a load of an atomic depends on the “happens
+> before” relation, which depends on the values observed by loads of atomics.
+> The intended reading is that there must exist an association of atomic loads
+> with modifications they observe that, together with suitably chosen
+> modification orders and the “happens before” relation derived as described
+> above, satisfy the resulting constraints as imposed here.
+
+Two actions are _potentially concurrent_ if
+- they are performed by different threads, or
+- they are unsequenced, at least one is performed by a signal handler, and they
+	are not both performed by the same signal handler invocation.
+
+The execution of a program contains a _data race_ if it contains two potentially
+concurrent conflicting actions, at least one of which is not atomic, and neither
+happens before the other. Any such data race results in undefined behavior.
+
+> _Note 21_: It can be shown that programs that correctly use mutexes and
+> `SeqCst` operations to prevent all data races and use no other synchronization
+> operations behave as if the operations executed by their constituent threads
+> were simply interleaved, with each value computation of an object being taken
+> from the last side effect on that object in that interleaving. This is normally
+> referred to as “sequential consistency”. However, this applies only to
+> data-race-free programs, and data-race-free programs cannot observe most
+> program transformations that do not change single-threaded program semantics.
+> In fact, most single-threaded program transformations continue to be allowed,
+> since any program that behaves differently as a result has undefined behavior.
+
+> _Note 22_: Compiler transformations that introduce assignments to a
+> potentially shared memory location that would not be modified by the abstract
+> machine are generally precluded by this document, since such an assignment
+> might overwrite another assignment by a different thread in cases in which an
+> abstract machine execution would not have encountered a data race. This
+> includes implementations of data member assignment that overwrite adjacent
+> members in separate memory locations. Reordering of atomic loads in cases in
+> which the atomics in question might alias is also generally precluded, since
+> this could violate the coherence rules.
+
+> _Note 23_: Transformations that introduce a speculative read of a potentially
+> shared memory location might not preserve the semantics of the C++ program as
+> defined in this document, since they potentially introduce a data race.
+> However, they are typically valid in the context of an optimizing compiler
+> that targets a specific machine with well-defined semantics for data races.
+> They would be invalid for a hypothetical machine that is not tolerant of races
+> or provides hardware race detection. 
+
+## Atomic orderings
+
+```rs
+// in ::core::sync::atomic
+#[non_exhaustive]
+pub enum Ordering {
+	Relaxed,
+	Release,
+	Acquire,
+	AcqRel,
+	SeqCst,
+}
+```
+
+The enumeration `Ordering` specifies the detailed regular (non-atomic) memory
+synchronization order as defined in this document and may provide for operation
+ordering. Its enumerated values and their meanings are as follows:
+- `Relaxed`: no operation orders memory.
+- `Release`, `AcqRel`, and `SeqCst`: a store operation performs a release
+	operation on the affected memory location.
+- `Acquire`, `AcqRel`, and `SeqCst`: a load operation performs an acquire
+	operation on the affected memory location.
+
+> _Note 2_: Atomic operations specifying `Relaxed` are relaxed with respect to
+> memory ordering. Implementations must still guarantee that any given atomic
+> access to a particular atomic object be indivisible with respect to all other
+> atomic accesses to that object.
+
+An atomic operation _A_ that performs a release operation on an atomic object
+_M_ synchronizes with an atomic operation _B_ that performs an acquire operation
+on _M_ and takes its value from any side effect in the release sequence headed
+by _A_.
+
+An atomic operation _A_ on some atomic object _M_ is coherence-ordered before
+another atomic operation _B_ on _M_ if
+- _A_ is a modification, and _B_ reads the value stored by _A_, or
+- _A_ precedes _B_ in the modification order of _M_, or
+- _A_ and _B_ are not the same atomic read-modify-write operation, and there
+	exists an atomic modification _X_ of _M_ such that _A_ reads the value
+	stored by _X_ and _X_ precedes _B_ in the modification order of _M_, or
+- there exists an atomic modification _X_ of _M_ such that _A_ is
+	coherence-ordered before _X_ and _X_ is coherence-ordered before _B_.
+
+There is a single total order _S_ on all `SeqCst` operations, including fences,
+that satisfies the following constraints. First, if _A_ and _B_ are `SeqCst`
+operations and _A_ strongly happens before _B_, then _A_ precedes _B_ in _S_.
+Second, for every pair of atomic operations _A_ and _B_ on an object _M_, where
+_A_ is coherence-ordered before _B_, the following four conditions are required
+to be satisfied by _S_:
+- if _A_ and _B_ are both `SeqCst` operations, then _A_ precedes _B_ in _S_; and
+- if _A_ is a `SeqCst` operation and _B_ happens before a `SeqCst` fence _Y_,
+	then _A_ precedes _Y_ in _S_; and
+- if a `SeqCst` fence _X_ happens before _A_ and _B_ is a `SeqCst` operation,
+	then _X_ precedes _B_ in _S_; and
+- if an `SeqCst` fence _X_ happens before _A_ and _B_ happens before a `SeqCst`
+	fence _Y_, then _X_ precedes _Y_ in _S_.
+
+> _Note 3_: This definition ensures that _S_ is consistent with the modification
+> order of any atomic object _M_. It also ensures that a `SeqCst` load _A_ of
+> _M_ gets its value either from the last modification of _M_ that precedes _A_
+> in _S_ or from some non-`SeqCst` modification of _M_ that does not happen
+> before any modification of _M_ that precedes _A_ in _S_.
+
+> _Note 4_: We do not require that _S_ be consistent with “happens before”. This
+> allows more efficient implementation of `Acquire` and `Release` on some
+> machine architectures. It can produce surprising results when these are mixed
+> with `SeqCst` accesses. 
+
+> _Note 5_: `SeqCst` ensures sequential consistency only for a program that is
+> free of data races and uses exclusively `SeqCst` atomic operations. Any use of
+> weaker ordering will invalidate this guarantee unless extreme care is used. In
+> many cases, `SeqCst` atomic operations are reorderable with respect to other
+> atomic operations performed by the same thread.
+
+Implementations should ensure that no “out-of-thin-air” values are computed that
+circularly depend on their own computation.
+
+> _Note 6_: For example, with `x` and `y` initially zero,
+> ```rs
+> // Thread 1:
+> let r1 = y.load(atomic::Ordering::Relaxed);
+> x.store(r1, atomic::Ordering::Relaxed);
+> // Thread 2:
+> let r2 = x.load(atomic::Ordering::Relaxed);
+> y.store(r2, atomic::Ordering::Relaxed);
+> ```
+> this recommendation discourages producing `r1 == r2 == 42`, since the store of
+> 42 to `y` is only possible if the store to `x` stores `42`, which circularly
+> depends on the store to `y` storing `42`. Note that without this restriction,
+> such an execution is possible.
+
+> _Note 7_: The recommendation similarly disallows `r1 == r2 == 42` in the
+> following example, with `x` and `y` again initially zero:
+> ```rs
+> // Thread 1:
+> let r1 = x.load(atomic::Ordering::Relaxed);
+> if r1 == 42 {
+>     y.store(42, atomic::Ordering::Relaxed);
+> }
+> // Thread 2:
+> let r2 = y.load(atomic::Ordering::Relaxed);
+> if r2 == 42 {
+>     x.store(42, atomic::Ordering::Relaxed);
+> }
+> ```
+
+Atomic read-modify-write operations shall always read the last value (in the
+modification order) written before the write associated with the
+read-modify-write operation.
+
+Implementations should make atomic stores visible to atomic loads within a
+reasonable amount of time.
+
+## Atomic fences
+
+This subclause introduces synchronization primitives called _fences_. Fences can
+have acquire semantics, release semantics, or both. A fence with acquire
+semantics is called an _acquire fence_. A fence with release semantics is called
+a _release fence_.
+
+A release fence _A_ synchronizes with an acquire fence _B_ if there exist atomic
+operations _X_ and _Y_, both operating on some atomic object _M_, such that _A_
+is sequenced before _X_, _X_ modifies _M_, _Y_ is sequenced before _B_, and _Y_
+reads the value written by _X_ or a value written by any side effect in the
+hypothetical release sequence _X_ would head if it were a release operation.
+
+A release fence _A_ synchronizes with an atomic operation _B_ that performs an
+acquire operation on an atomic object _M_ if there exists an atomic operation
+_X_ such that _A_ is sequenced before _X_, _X_ modifies _M_, and _B_ reads the
+value written by _X_ or a value written by any side effect in the hypothetical
+release sequence _X_ would head if it were a release operation.
+
+An atomic operation _A_ that is a release operation on an atomic object _M_
+synchronizes with an acquire fence _B_ if there exists some atomic operation _X_
+on _M_ such that _X_ is sequenced before _B_ and reads the value written by _A_
+or a value written by any side effect in the release sequence headed by _A_.
+
+```rs
+pub fn fence(order: Ordering);
+```
+
+_Effects_: Depending on the value of `order`, this operation:
+- has no effects, if `order == Relaxed`;
+- is an acquire fence, if `order == Acquire`;
+- is a release fence, if `order == Release`;
+- is both an acquire and a release fence, if `order == AcqRel`;
+- is a sequentially consistent acquire and release fence, if `order == SeqCst`.
+
+```rs
+pub fn compiler_fence(order: Ordering);
+```
+
+_Effects_: Equivalent to `fence(order)`, except that the resulting ordering
+constraints are established only between a thread and a signal handler executed
+in the same thread.
+
+> _Note 1_: `compiler_fence` can be used to specify the order in which actions
+> performed by the thread become visible to the signal handler. Compiler
+> optimizations and reorderings of loads and stores are inhibited in the same
+> way as with `fence` but the hardware fence instructions that `fence` would
+> have inserted are not emitted.