Write “Multithreaded Execution” and add simplified atomic spec

pull/378/head
SabrinaJewson 2 years ago
parent 8d1e4dccf7
commit 0347b0183f
No known key found for this signature in database
GPG Key ID: 3D5438FFA5F05564

@ -41,7 +41,14 @@
* [Concurrency](concurrency.md) * [Concurrency](concurrency.md)
* [Races](races.md) * [Races](races.md)
* [Send and Sync](send-and-sync.md) * [Send and Sync](send-and-sync.md)
* [Atomics](atomics.md) * [Atomics](./atomics/atomics.md)
* [Multithreaded Execution](./atomics/multithread.md)
* [Relaxed](./atomics/relaxed.md)
* [Acquire and Release](./atomics/acquire-release.md)
* [SeqCst](./atomics/seqcst.md)
* [Fences](./atomics/fences.md)
* [Signals](./atomics/signals.md)
* [Specification](./atomics/specification.md)
* [Implementing Vec](./vec/vec.md) * [Implementing Vec](./vec/vec.md)
* [Layout](./vec/vec-layout.md) * [Layout](./vec/vec-layout.md)
* [Allocating](./vec/vec-alloc.md) * [Allocating](./vec/vec-alloc.md)

@ -0,0 +1 @@
# Acquire and Release

@ -17,12 +17,24 @@ details, you should check out the [C++ specification][C++-model].
Still, we'll try to cover the basics and some of the problems Rust developers Still, we'll try to cover the basics and some of the problems Rust developers
face. face.
The C++ memory model is fundamentally about trying to bridge the gap between the ## Motivation
semantics we want, the optimizations compilers want, and the inconsistent chaos
our hardware wants. *We* would like to just write programs and have them do
exactly what we said but, you know, fast. Wouldn't that be great?
## Compiler Reordering The C++ memory model is very large and confusing with lots of seemingly
arbitrary design decisions. To understand the motivation behind this, it can
help to look at what got us in this situation in the first place. There are
three main factors at play here:
1. Users of the language, who want fast, cross-platform code;
2. compilers, who want to optimize code to make it fast;
3. and the hardware, which is ready to unleash a wrath of inconsistent chaos on
your program at a moment's notice.
The C++ memory model is fundamentally about trying to bridge the gap between
these three, allowing users to write code for a logical and consistent abstract
machine while the compiler and hardware deal with the madness underneath that
makes it run fast.
### Compiler Reordering
Compilers fundamentally want to be able to do all sorts of complicated Compilers fundamentally want to be able to do all sorts of complicated
transformations to reduce data dependencies and eliminate dead code. In transformations to reduce data dependencies and eliminate dead code. In
@ -53,7 +65,7 @@ able to make these kinds of optimizations, because they can seriously improve
performance. On the other hand, we'd also like to be able to depend on our performance. On the other hand, we'd also like to be able to depend on our
program *doing the thing we said*. program *doing the thing we said*.
## Hardware Reordering ### Hardware Reordering
On the other hand, even if the compiler totally understood what we wanted and On the other hand, even if the compiler totally understood what we wanted and
respected our wishes, our hardware might instead get us in trouble. Trouble respected our wishes, our hardware might instead get us in trouble. Trouble
@ -106,6 +118,8 @@ programming:
incorrect. If possible, concurrent algorithms should be tested on incorrect. If possible, concurrent algorithms should be tested on
weakly-ordered hardware. weakly-ordered hardware.
---
## Data Accesses ## Data Accesses
The C++ memory model attempts to bridge the gap by allowing us to talk about the The C++ memory model attempts to bridge the gap by allowing us to talk about the

@ -0,0 +1,220 @@
# Multithreaded Execution
The first important thing to understand about C++20 atomics is that **the
abstract machine has no concept of time**. You might expect there to be a single
global ordering of events across the program where each happens at the same time
or one after the other, but under the abstract model no such ordering exists;
instead, a possible execution of the program must be treated as a single event
that happens instantaneously — there is never any such thing as “now”, or a
“latest value”, and using that terminology will only lead you to more confusion.
(Of course, in reality there does exist a concept of time, but you must keep in
mind that youre not programming for the hardware, youre programming for the
AM.)
However, while no global ordering of operations exists _between_ threads, there
does exist a single total ordering _within_ each thread, which is known as its
_sequence_. For example, given this simple Rust program:
```rs
println!("A");
println!("B");
```
its sequence during one possible execution can be visualized like so:
```text
╭───────────────╮
│ println!("A") │
╰───────╥───────╯
╭───────⇓───────╮
│ println!("B") │
╰───────────────╯
```
That double arrow in between the two boxes (`⇒`) represents that the second
statement is _sequenced after_ the first (and similarly the first statement is
_sequenced before_ the second). This is the strongest kind of ordering guarantee
between any two operations, and only comes about when those two operations
happen one after the other and on the same thread.
If we add a second thread to the mix:
```rs
// Thread 1:
println!("A");
println!("B");
// Thread 2:
eprintln!("01");
eprintln!("02");
```
it will simply coexist in parallel, with each thread getting its own independent
sequence:
```text
Thread 1 Thread 2
╭───────────────╮ ╭─────────────────╮
│ println!("A") │ │ eprintln!("01") │
╰───────╥───────╯ ╰────────╥────────╯
╭───────⇓───────╮ ╭────────⇓────────╮
│ println!("B") │ │ eprintln!("02") │
╰───────────────╯ ╰─────────────────╯
```
Note that this is **not** a representation of multiple things that _could_
happen at runtime — instead, this diagram describes exactly what _did_ happen
when the program ran once. This distinction is key, because it highlights that
even the lowest-level representation of a programs execution does not have
a global ordering between threads; those two disconnected chains are all there
is.
Now lets make things more interesting by introducing some shared data, and have
both threads read it.
```rs
// Initial state
let data = 0;
// Thread 1:
data;
// Thread 2:
data;
```
Each memory location, similarly to threads, can be shown as another column on
our diagram, but holding values instead of instructions, and each access (read
or write) manifests as a line from the instruction that performed the access to
the associated value in the column. So this code can produce (and is in fact
guaranteed to produce) the following execution:
```text
Thread 1 data Thread 2
╭──────╮ ┌────┐ ╭──────╮
│ data ├╌╌╌╌┤ 0 ├╌╌╌╌┤ data │
╰──────╯ └────┘ ╰──────╯
```
That is, both threads read the same value of `0` from `data`, with no relative
ordering between them. This is the simple case, for when the data doesnt ever
change — but thats no fun, so lets add some mutability in the mix (well also
return to a single thread, just to keep things simple).
Consider this code, which were going to attempt to draw a diagram for like
above:
```rs
let mut data = 0;
data = 1;
data;
data = 2;
```
Working out executions of code like this is rather like solving a Sudoku puzzle:
you must first lay out all the facts that you know, and then fill in the blanks
with logical reasoning. The initial information weve been given is both the
initial value of `data` and the sequential order of Thread 1; we also know that
over its lifetime, `data` takes on a total of three different values that were
caused by two different non-atomic writes. This allows us to start drawing out
some boxes:
```text
Thread 1 data
╭───────╮ ┌────┐
│ = 1 ├╌? │ 0 │
╰───╥───╯ ?╌┼╌╌╌╌┤
╭───⇓───╮ ?╌┼╌╌╌╌┤
│ data ├╌? │ ? │
╰───╥───╯ ?╌┼╌╌╌╌┤
╭───⇓───╮ ?╌┼╌╌╌╌┤
│ = 2 ├╌? │ ? │
╰───────╯ └────┘
```
Note the use of dashed padding in between the values of `data`s column. Those
spaces wont ever contain a value, but theyre used to represent an
unsynchronized (non-atomic) write — it is garbage data and attempting to read it
would result in a data race.
To solve this puzzle, we first need to bring in a new rule that governs all
memory accesses to a particular location:
> From the point at which the access occurs, find every other point that can be
> reached by following the reverse direction of arrows, then for each one of
> those, take a single step across every line that connects to the relevant
> memory location. **It is not allowed for the access to read or write any value
> that appears above any one of these points**.
In our case, there are two potential executions: one, where the first write
corresponds to the first value in `data`, and two, where the first write
corresponds to the second value in `data`. Considering the second case for a
moment, it would also force the second write to correspond to the first
value in `data`. Therefore its diagram would look something like this:
```text
Thread 1 data
╭───────╮ ┌────┐
│ = 1 ├╌╌┐ │ 0 │
╰───╥───╯ ┊ ┌╌╌┼╌╌╌╌┤
╭───⇓───╮ ┊ ├╌╌┼╌╌╌╌┤
│ data ├╌?┊ ┊ │ 2 │
╰───╥───╯ ├╌┼╌╌┼╌╌╌╌┤
╭───⇓───╮ └╌┼╌╌┼╌╌╌╌┤
│ = 2 ├╌╌╌╌┘ │ 1 │
╰───────╯ └────┘
```
However, that second line breaks the rule we just established! Following up the
arrows from the third operation in Thread 1, we reach the first operation, and
from there we can take a single step to reach the space in between the `2` and
the `1`, which excludes the this access from writing any value above that point.
So evidently, this execution is no good. We can therefore conclude that the only
possible execution of this program is the other one, in which the `1` appears
above the `2`:
```text
Thread 1 data
╭───────╮ ┌────┐
│ = 1 ├╌╌┐ │ 0 │
╰───╥───╯ ├╌╌┼╌╌╌╌┤
╭───⇓───╮ └╌╌┼╌╌╌╌┤
│ data ├╌? │ 1 │
╰───╥───╯ ┌╌╌┼╌╌╌╌┤
╭───⇓───╮ ├╌╌┼╌╌╌╌┤
│ = 2 ├╌╌┘ │ 2 │
╰───────╯ └────┘
```
Now to sort out the read operation in the middle. We can use the same rule as
before to trace up to the first write and rule out us reading either the `0`
value or the garbage that exists between it and `1`, but how to we choose
between the `1` and the `2`? Well, as it turns out there is a complement to the
rule we already defined which gives us the exact answer we need:
> From the point at which the access occurs, find every other point that can be
> reached by following the _forward_ direction of arrows, then for each one of
> those, take a single step across every line that connects to the relevant
> memory location. **It is not allowed for the access to read or write any value
> that appears below any one of these points**.
Using this rule, we can follow the arrow downwards and then across and finally
rule out `2` as well as the garbage before it. This leaves us with exactly _one_
value that the read operation can return, and exactly one possible execution
guaranteed by the Abstract Machine:
```text
Thread 1 data
╭───────╮ ┌────┐
│ = 1 ├╌╌┐ │ 0 │
╰───╥───╯ ├╌╌┼╌╌╌╌┤
╭───⇓───╮ └╌╌┼╌╌╌╌┤
│ data ├╌╌╌╌╌┤ 1 │
╰───╥───╯ ┌╌╌┼╌╌╌╌┤
╭───⇓───╮ ├╌╌┼╌╌╌╌┤
│ = 2 ├╌╌┘ │ 2 │
╰───────╯ └────┘
```
You might be thinking that all this has been is the longest, most convoluted
explanation ever of the most basic intuitive semantics of programming — and
youd be absolutely right. But its essential to grasp these fundamentals,
because once you have this model in mind, the extension into multiple threads
and the complicated semantics of real atomics becomes completely natural.

@ -0,0 +1,43 @@
# Relaxed
Now weve got single-threaded mutation semantics out of the way, we can try
reintroducing a second thread. Well have one thread perform a write to the
memory location, and a second thread read from it, like so:
```rs
// Initial state
let mut state = 0;
// Thread 1:
data = 1;
// Thread 2:
data;
```
Of course, any Rust programmer will immediately tell you that this code doesnt
compile, and indeed it definitely does not, and for good reason. But suspend
your disbelief for a moment, and imagine what would happen if it did. Lets draw
a diagram, leaving out the reading lines for now:
```text
Thread 1 data Thread 2
╭───────╮ ┌────┐ ╭───────╮
│ = 1 ├╌┐ │ 0 │ ?╌┤ data │
╰───────╯ ├╌┼╌╌╌╌┤ ╰───────╯
└╌┼╌╌╌╌┤
│ 1 │
└────┘
```
Lets try to figure out where the line in Thread 2s access joins up. The rules
from before dont help us much unfortunately since there are no arrows
connecting that operation to anything, so we cant immediately rule anything
out. As a result, we end up facing a situation we havent faced before: there is
_more than one_ potential value for Thread 2 to read.
And this is where we encounter the big limitation with unsynchronized data
accesses: the price we pay for their speed and optimization capability is that
this situation is considered **Undefined Behavior**. For an unsynchronized read
to be acceptable, there has to be _exactly one_ potential value for it to read,
and when there are multiple like in this situation it is considered a data race.
## “Out-of-thin-air” values

@ -0,0 +1,3 @@
# Signals
(and compiler fences)

@ -0,0 +1,354 @@
# Specification
Below is a modified C++20 specification draft (as it was on 2022-07-16), edited
to remove C++-only features like consume orderings and `sig_atomic_t`.
Note that although this has been checked, atomics are very difficult to get
right and so there may be subtle mistakes. If you want to more formally check
your software, read the [\[intro.races\]], [\[atomics.order\]] and
[\[atomics.fences\]] sections of the real C++ specification.
[\[intro.races\]]: https://eel.is/c++draft/intro.races
[\[atomics.order\]]: https://eel.is/c++draft/atomics.order
[\[atomics.fences\]]: https://eel.is/c++draft/atomics.fences
## Data races
The value of an object visible to a thread _T_ at a particular point is the
initial value of the object, a value assigned to the object by _T_, or a value
assigned to the object by another thread, according to the rules below.
> _Note 1_: In some cases, there might instead be undefined behavior. Much of
> this subclause is motivated by the desire to support atomic operations with
> explicit and detailed visibility constraints. However, it also implicitly
> supports a simpler view for more restricted programs.
Two expression evaluations _conflict_ if one of them modifies a memory location
and the other one reads or modifies the same memory location.
The library defines a number of atomic operations and operations on mutexes that
are specially identified as synchronization operations. These operations play a
special role in making assignments in one thread visible to another. A
synchronization operation on one or more memory locations is either an acquire
operation, a release operation, or both an acquire and release operation. A
synchronization operation without an associated memory location is a fence and
can be either an acquire fence, a release fence, or both an acquire and release
fence. In addition, there are relaxed atomic operations, which are not
synchronization operations, and atomic read-modify-write operations, which have
special characteristics.
> _Note 2_: For example, a call that acquires a mutex will perform an acquire
> operation on the locations comprising the mutex. Correspondingly, a call that
> releases the same mutex will perform a release operation on those same
> locations. Informally, performing a release operation on _A_ forces prior side
> effects on other memory locations to become visible to other threads that
> later perform an acquire operation on _A_. “Relaxed” atomic operations are not
> synchronization operations even though, like synchronization operations, they
> cannot contribute to data races.
All modifications to a particular atomic object _M_ occur in some particular
total order, called the _modification order_ of _M_.
> _Note 3_: There is a separate order for each atomic object. There is no
> requirement that these can be combined into a single total order for all
> objects. In general this will be impossible since different threads can
> observe modifications to different objects in inconsistent orders.
A _release sequence_ headed by a release operation _A_ on an atomic object _M_
is a maximal contiguous sub-sequence of side effects in the modification order
of _M_, where the first operation is _A_, and every subsequent operation is an
atomic read-modify-write operation.
Certain library calls _synchronize with_ other library calls performed by
another thread. For example, an atomic store-release synchronizes with a
load-acquire that takes its value from the store.
> _Note 4_: Except in the specified cases, reading a later value does not
> necessarily ensure visibility as described below. Such a requirement would
> sometimes interfere with efficient implementation.
> _Note 5_: The specifications of the synchronization operations define when one
> reads the value written by another. For atomic objects, the definition is
> clear. All operations on a given mutex occur in a single total order. Each
> mutex acquisition “reads the value written” by the last mutex release.
An evaluation _A_ _happens before_ an evaluation _B_ (or, equivalently, _B_
_happens after_ _A_) if either:
- _A_ is sequenced before _B_, or
- _A_ synchronizes with _B_, or
- for some evaluation _X_, _A_ happens before _X_ and _X_ happens before _B_.
An evaluation _A_ _strongly happens before_ an evaluation _D_ if, either
- _A_ is sequenced before _D_, or
- _A_ synchronizes with _D_, and both _A_ and _D_ and sequentially consistent
atomic operations, or
- there are evaluations _B_ and _C_ such that _A_ is sequenced before _B_, _B_
happens before _C_, and _C_ is sequenced before _D_, or
- there is an evaluation _B_ such that _A_ strongly happens before _B_, and _B_
strongly happens before _D_.
> _Note 11_: Informally, if _A_ strongly happens before _B_, then _A_ appears to
> be evaluated before _B_ in all contexts.
A _visible side effect_ _A_ on a scalar object _M_ with respect to a value
computation _B_ of _M_ satisfies the conditions:
- _A_ happens before _B_ and
- there is no other side effect _X_ to _M_ such that _A_ happens before _X_ and
_X_ happens before _B_.
The value of a non-atomic scalar object _M_, as determined by evaluation _B_,
shall be the value stored by the visible side effect _A_.
> _Note 12_: If there is ambiguity about which side effect to a non-atomic
> object is visible, then the behavior is either unspecified or undefined.
> _Note 13_: This states that operations on ordinary objects are not visibly
> reordered. This is not actually detectable without data races, but it is
> necessary to ensure that data races, as defined below, and with suitable
> restrictions on the use of atomics, correspond to data races in a simple
> interleaved (sequentially consistent) execution.
The value of an atomic object _M_, as determined by evaluation _B_, shall be the
value stored by some side effect _A_ that modifies _M_, where _B_ does not
happen before _A_.
> _Note 14_: The set of such side effects is also restricted by the rest of the
> rules described here, and in particular, by the coherence requirements below.
If an operation _A_ that modifies an atomic object _M_ happens before an
operation _B_ that modifies _M_, then _A_ shall be earlier than _B_ in the
modification order of _M_.
> _Note 15_: This requirement is known as write-write coherence.
If a value computation _A_ of an atomic object _M_ happens before a value
computation _B_ of _M_, and _A_ takes its value from a side effect _X_ on _M_,
then the value computed by _B_ shall either be the value stored by _X_ or the
value stored by a side effect _Y_ on _M_, where _Y_ follows _X_ in the
modification order of _M_.
> _Note 16_: This requirement is known as read-read coherence.
If a value computation _A_ of an atomic object _M_ happens before an operation
_B_ that modifies _M_, then _A_ shall take its value from a side effect _X_ on
_M_, where _X_ precedes _B_ in the modification order of _M_.
> _Note 17_: This requirement is known as read-write coherence.
If a side effect _X_ on an atomic object _M_ happens before a value computation
_B_ of _M_, then the evaluation _B_ shall take its value from _X_ or from a side
effect _Y_ that follows _X_ in the modification order of _M_.
> _Note 18_: This requirement is known as write-read coherence.
> _Note 19_: The four preceding coherence requirements effectively disallow
> compiler reordering of atomic operations to a single object, even if both
> operations are relaxed loads. This effectively makes the cache coherence
> guarantee provided by most hardware available to C++ atomic operations.
> _Note 20_: The value observed by a load of an atomic depends on the “happens
> before” relation, which depends on the values observed by loads of atomics.
> The intended reading is that there must exist an association of atomic loads
> with modifications they observe that, together with suitably chosen
> modification orders and the “happens before” relation derived as described
> above, satisfy the resulting constraints as imposed here.
Two actions are _potentially concurrent_ if
- they are performed by different threads, or
- they are unsequenced, at least one is performed by a signal handler, and they
are not both performed by the same signal handler invocation.
The execution of a program contains a _data race_ if it contains two potentially
concurrent conflicting actions, at least one of which is not atomic, and neither
happens before the other. Any such data race results in undefined behavior.
> _Note 21_: It can be shown that programs that correctly use mutexes and
> `SeqCst` operations to prevent all data races and use no other synchronization
> operations behave as if the operations executed by their constituent threads
> were simply interleaved, with each value computation of an object being taken
> from the last side effect on that object in that interleaving. This is normally
> referred to as “sequential consistency”. However, this applies only to
> data-race-free programs, and data-race-free programs cannot observe most
> program transformations that do not change single-threaded program semantics.
> In fact, most single-threaded program transformations continue to be allowed,
> since any program that behaves differently as a result has undefined behavior.
> _Note 22_: Compiler transformations that introduce assignments to a
> potentially shared memory location that would not be modified by the abstract
> machine are generally precluded by this document, since such an assignment
> might overwrite another assignment by a different thread in cases in which an
> abstract machine execution would not have encountered a data race. This
> includes implementations of data member assignment that overwrite adjacent
> members in separate memory locations. Reordering of atomic loads in cases in
> which the atomics in question might alias is also generally precluded, since
> this could violate the coherence rules.
> _Note 23_: Transformations that introduce a speculative read of a potentially
> shared memory location might not preserve the semantics of the C++ program as
> defined in this document, since they potentially introduce a data race.
> However, they are typically valid in the context of an optimizing compiler
> that targets a specific machine with well-defined semantics for data races.
> They would be invalid for a hypothetical machine that is not tolerant of races
> or provides hardware race detection.
## Atomic orderings
```rs
// in ::core::sync::atomic
#[non_exhaustive]
pub enum Ordering {
Relaxed,
Release,
Acquire,
AcqRel,
SeqCst,
}
```
The enumeration `Ordering` specifies the detailed regular (non-atomic) memory
synchronization order as defined in this document and may provide for operation
ordering. Its enumerated values and their meanings are as follows:
- `Relaxed`: no operation orders memory.
- `Release`, `AcqRel`, and `SeqCst`: a store operation performs a release
operation on the affected memory location.
- `Acquire`, `AcqRel`, and `SeqCst`: a load operation performs an acquire
operation on the affected memory location.
> _Note 2_: Atomic operations specifying `Relaxed` are relaxed with respect to
> memory ordering. Implementations must still guarantee that any given atomic
> access to a particular atomic object be indivisible with respect to all other
> atomic accesses to that object.
An atomic operation _A_ that performs a release operation on an atomic object
_M_ synchronizes with an atomic operation _B_ that performs an acquire operation
on _M_ and takes its value from any side effect in the release sequence headed
by _A_.
An atomic operation _A_ on some atomic object _M_ is coherence-ordered before
another atomic operation _B_ on _M_ if
- _A_ is a modification, and _B_ reads the value stored by _A_, or
- _A_ precedes _B_ in the modification order of _M_, or
- _A_ and _B_ are not the same atomic read-modify-write operation, and there
exists an atomic modification _X_ of _M_ such that _A_ reads the value
stored by _X_ and _X_ precedes _B_ in the modification order of _M_, or
- there exists an atomic modification _X_ of _M_ such that _A_ is
coherence-ordered before _X_ and _X_ is coherence-ordered before _B_.
There is a single total order _S_ on all `SeqCst` operations, including fences,
that satisfies the following constraints. First, if _A_ and _B_ are `SeqCst`
operations and _A_ strongly happens before _B_, then _A_ precedes _B_ in _S_.
Second, for every pair of atomic operations _A_ and _B_ on an object _M_, where
_A_ is coherence-ordered before _B_, the following four conditions are required
to be satisfied by _S_:
- if _A_ and _B_ are both `SeqCst` operations, then _A_ precedes _B_ in _S_; and
- if _A_ is a `SeqCst` operation and _B_ happens before a `SeqCst` fence _Y_,
then _A_ precedes _Y_ in _S_; and
- if a `SeqCst` fence _X_ happens before _A_ and _B_ is a `SeqCst` operation,
then _X_ precedes _B_ in _S_; and
- if an `SeqCst` fence _X_ happens before _A_ and _B_ happens before a `SeqCst`
fence _Y_, then _X_ precedes _Y_ in _S_.
> _Note 3_: This definition ensures that _S_ is consistent with the modification
> order of any atomic object _M_. It also ensures that a `SeqCst` load _A_ of
> _M_ gets its value either from the last modification of _M_ that precedes _A_
> in _S_ or from some non-`SeqCst` modification of _M_ that does not happen
> before any modification of _M_ that precedes _A_ in _S_.
> _Note 4_: We do not require that _S_ be consistent with “happens before”. This
> allows more efficient implementation of `Acquire` and `Release` on some
> machine architectures. It can produce surprising results when these are mixed
> with `SeqCst` accesses.
> _Note 5_: `SeqCst` ensures sequential consistency only for a program that is
> free of data races and uses exclusively `SeqCst` atomic operations. Any use of
> weaker ordering will invalidate this guarantee unless extreme care is used. In
> many cases, `SeqCst` atomic operations are reorderable with respect to other
> atomic operations performed by the same thread.
Implementations should ensure that no “out-of-thin-air” values are computed that
circularly depend on their own computation.
> _Note 6_: For example, with `x` and `y` initially zero,
> ```rs
> // Thread 1:
> let r1 = y.load(atomic::Ordering::Relaxed);
> x.store(r1, atomic::Ordering::Relaxed);
> // Thread 2:
> let r2 = x.load(atomic::Ordering::Relaxed);
> y.store(r2, atomic::Ordering::Relaxed);
> ```
> this recommendation discourages producing `r1 == r2 == 42`, since the store of
> 42 to `y` is only possible if the store to `x` stores `42`, which circularly
> depends on the store to `y` storing `42`. Note that without this restriction,
> such an execution is possible.
> _Note 7_: The recommendation similarly disallows `r1 == r2 == 42` in the
> following example, with `x` and `y` again initially zero:
> ```rs
> // Thread 1:
> let r1 = x.load(atomic::Ordering::Relaxed);
> if r1 == 42 {
> y.store(42, atomic::Ordering::Relaxed);
> }
> // Thread 2:
> let r2 = y.load(atomic::Ordering::Relaxed);
> if r2 == 42 {
> x.store(42, atomic::Ordering::Relaxed);
> }
> ```
Atomic read-modify-write operations shall always read the last value (in the
modification order) written before the write associated with the
read-modify-write operation.
Implementations should make atomic stores visible to atomic loads within a
reasonable amount of time.
## Atomic fences
This subclause introduces synchronization primitives called _fences_. Fences can
have acquire semantics, release semantics, or both. A fence with acquire
semantics is called an _acquire fence_. A fence with release semantics is called
a _release fence_.
A release fence _A_ synchronizes with an acquire fence _B_ if there exist atomic
operations _X_ and _Y_, both operating on some atomic object _M_, such that _A_
is sequenced before _X_, _X_ modifies _M_, _Y_ is sequenced before _B_, and _Y_
reads the value written by _X_ or a value written by any side effect in the
hypothetical release sequence _X_ would head if it were a release operation.
A release fence _A_ synchronizes with an atomic operation _B_ that performs an
acquire operation on an atomic object _M_ if there exists an atomic operation
_X_ such that _A_ is sequenced before _X_, _X_ modifies _M_, and _B_ reads the
value written by _X_ or a value written by any side effect in the hypothetical
release sequence _X_ would head if it were a release operation.
An atomic operation _A_ that is a release operation on an atomic object _M_
synchronizes with an acquire fence _B_ if there exists some atomic operation _X_
on _M_ such that _X_ is sequenced before _B_ and reads the value written by _A_
or a value written by any side effect in the release sequence headed by _A_.
```rs
pub fn fence(order: Ordering);
```
_Effects_: Depending on the value of `order`, this operation:
- has no effects, if `order == Relaxed`;
- is an acquire fence, if `order == Acquire`;
- is a release fence, if `order == Release`;
- is both an acquire and a release fence, if `order == AcqRel`;
- is a sequentially consistent acquire and release fence, if `order == SeqCst`.
```rs
pub fn compiler_fence(order: Ordering);
```
_Effects_: Equivalent to `fence(order)`, except that the resulting ordering
constraints are established only between a thread and a signal handler executed
in the same thread.
> _Note 1_: `compiler_fence` can be used to specify the order in which actions
> performed by the thread become visible to the signal handler. Compiler
> optimizations and reorderings of loads and stores are inhibited in the same
> way as with `fence` but the hardware fence instructions that `fence` would
> have inserted are not emitted.
Loading…
Cancel
Save