You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
nomicon/src/atomics/specification.md

17 KiB

Specification

Below is a modified C++20 specification draft (as it was on 2022-07-16), edited to remove C++-only features like consume orderings and sig_atomic_t.

Note that although this has been checked, atomics are very difficult to get right and so there may be subtle mistakes. If you want to more formally check your software, read the [intro.races], [atomics.order] and [atomics.fences] sections of the real C++ specification.

Data races

The value of an object visible to a thread T at a particular point is the initial value of the object, a value assigned to the object by T, or a value assigned to the object by another thread, according to the rules below.

Note 1: In some cases, there might instead be undefined behavior. Much of this subclause is motivated by the desire to support atomic operations with explicit and detailed visibility constraints. However, it also implicitly supports a simpler view for more restricted programs.

Two expression evaluations conflict if one of them modifies a memory location and the other one reads or modifies the same memory location.

The library defines a number of atomic operations and operations on mutexes that are specially identified as synchronization operations. These operations play a special role in making assignments in one thread visible to another. A synchronization operation on one or more memory locations is either an acquire operation, a release operation, or both an acquire and release operation. A synchronization operation without an associated memory location is a fence and can be either an acquire fence, a release fence, or both an acquire and release fence. In addition, there are relaxed atomic operations, which are not synchronization operations, and atomic read-modify-write operations, which have special characteristics.

Note 2: For example, a call that acquires a mutex will perform an acquire operation on the locations comprising the mutex. Correspondingly, a call that releases the same mutex will perform a release operation on those same locations. Informally, performing a release operation on A forces prior side effects on other memory locations to become visible to other threads that later perform an acquire operation on A. “Relaxed” atomic operations are not synchronization operations even though, like synchronization operations, they cannot contribute to data races.

All modifications to a particular atomic object M occur in some particular total order, called the modification order of M.

Note 3: There is a separate order for each atomic object. There is no requirement that these can be combined into a single total order for all objects. In general this will be impossible since different threads can observe modifications to different objects in inconsistent orders.

A release sequence headed by a release operation A on an atomic object M is a maximal contiguous sub-sequence of side effects in the modification order of M, where the first operation is A, and every subsequent operation is an atomic read-modify-write operation.

Certain library calls synchronize with other library calls performed by another thread. For example, an atomic store-release synchronizes with a load-acquire that takes its value from the store.

Note 4: Except in the specified cases, reading a later value does not necessarily ensure visibility as described below. Such a requirement would sometimes interfere with efficient implementation.

Note 5: The specifications of the synchronization operations define when one reads the value written by another. For atomic objects, the definition is clear. All operations on a given mutex occur in a single total order. Each mutex acquisition “reads the value written” by the last mutex release.

An evaluation A happens before an evaluation B (or, equivalently, B happens after A) if either:

  • A is sequenced before B, or
  • A synchronizes with B, or
  • for some evaluation X, A happens before X and X happens before B.

An evaluation A strongly happens before an evaluation D if, either

  • A is sequenced before D, or
  • A synchronizes with D, and both A and D and sequentially consistent atomic operations, or
  • there are evaluations B and C such that A is sequenced before B, B happens before C, and C is sequenced before D, or
  • there is an evaluation B such that A strongly happens before B, and B strongly happens before D.

Note 11: Informally, if A strongly happens before B, then A appears to be evaluated before B in all contexts.

A visible side effect A on a scalar object M with respect to a value computation B of M satisfies the conditions:

  • A happens before B and
  • there is no other side effect X to M such that A happens before X and X happens before B.

The value of a non-atomic scalar object M, as determined by evaluation B, shall be the value stored by the visible side effect A.

Note 12: If there is ambiguity about which side effect to a non-atomic object is visible, then the behavior is either unspecified or undefined.

Note 13: This states that operations on ordinary objects are not visibly reordered. This is not actually detectable without data races, but it is necessary to ensure that data races, as defined below, and with suitable restrictions on the use of atomics, correspond to data races in a simple interleaved (sequentially consistent) execution.

The value of an atomic object M, as determined by evaluation B, shall be the value stored by some side effect A that modifies M, where B does not happen before A.

Note 14: The set of such side effects is also restricted by the rest of the rules described here, and in particular, by the coherence requirements below.

If an operation A that modifies an atomic object M happens before an operation B that modifies M, then A shall be earlier than B in the modification order of M.

Note 15: This requirement is known as write-write coherence.

If a value computation A of an atomic object M happens before a value computation B of M, and A takes its value from a side effect X on M, then the value computed by B shall either be the value stored by X or the value stored by a side effect Y on M, where Y follows X in the modification order of M.

Note 16: This requirement is known as read-read coherence.

If a value computation A of an atomic object M happens before an operation B that modifies M, then A shall take its value from a side effect X on M, where X precedes B in the modification order of M.

Note 17: This requirement is known as read-write coherence.

If a side effect X on an atomic object M happens before a value computation B of M, then the evaluation B shall take its value from X or from a side effect Y that follows X in the modification order of M.

Note 18: This requirement is known as write-read coherence.

Note 19: The four preceding coherence requirements effectively disallow compiler reordering of atomic operations to a single object, even if both operations are relaxed loads. This effectively makes the cache coherence guarantee provided by most hardware available to C++ atomic operations.

Note 20: The value observed by a load of an atomic depends on the “happens before” relation, which depends on the values observed by loads of atomics. The intended reading is that there must exist an association of atomic loads with modifications they observe that, together with suitably chosen modification orders and the “happens before” relation derived as described above, satisfy the resulting constraints as imposed here.

Two actions are potentially concurrent if

  • they are performed by different threads, or
  • they are unsequenced, at least one is performed by a signal handler, and they are not both performed by the same signal handler invocation.

The execution of a program contains a data race if it contains two potentially concurrent conflicting actions, at least one of which is not atomic, and neither happens before the other. Any such data race results in undefined behavior.

Note 21: It can be shown that programs that correctly use mutexes and SeqCst operations to prevent all data races and use no other synchronization operations behave as if the operations executed by their constituent threads were simply interleaved, with each value computation of an object being taken from the last side effect on that object in that interleaving. This is normally referred to as “sequential consistency”. However, this applies only to data-race-free programs, and data-race-free programs cannot observe most program transformations that do not change single-threaded program semantics. In fact, most single-threaded program transformations continue to be allowed, since any program that behaves differently as a result has undefined behavior.

Note 22: Compiler transformations that introduce assignments to a potentially shared memory location that would not be modified by the abstract machine are generally precluded by this document, since such an assignment might overwrite another assignment by a different thread in cases in which an abstract machine execution would not have encountered a data race. This includes implementations of data member assignment that overwrite adjacent members in separate memory locations. Reordering of atomic loads in cases in which the atomics in question might alias is also generally precluded, since this could violate the coherence rules.

Note 23: Transformations that introduce a speculative read of a potentially shared memory location might not preserve the semantics of the C++ program as defined in this document, since they potentially introduce a data race. However, they are typically valid in the context of an optimizing compiler that targets a specific machine with well-defined semantics for data races. They would be invalid for a hypothetical machine that is not tolerant of races or provides hardware race detection.

Atomic orderings

// in ::core::sync::atomic
#[non_exhaustive]
pub enum Ordering {
	Relaxed,
	Release,
	Acquire,
	AcqRel,
	SeqCst,
}

The enumeration Ordering specifies the detailed regular (non-atomic) memory synchronization order as defined in this document and may provide for operation ordering. Its enumerated values and their meanings are as follows:

  • Relaxed: no operation orders memory.
  • Release, AcqRel, and SeqCst: a store operation performs a release operation on the affected memory location.
  • Acquire, AcqRel, and SeqCst: a load operation performs an acquire operation on the affected memory location.

Note 2: Atomic operations specifying Relaxed are relaxed with respect to memory ordering. Implementations must still guarantee that any given atomic access to a particular atomic object be indivisible with respect to all other atomic accesses to that object.

An atomic operation A that performs a release operation on an atomic object M synchronizes with an atomic operation B that performs an acquire operation on M and takes its value from any side effect in the release sequence headed by A.

An atomic operation A on some atomic object M is coherence-ordered before another atomic operation B on M if

  • A is a modification, and B reads the value stored by A, or
  • A precedes B in the modification order of M, or
  • A and B are not the same atomic read-modify-write operation, and there exists an atomic modification X of M such that A reads the value stored by X and X precedes B in the modification order of M, or
  • there exists an atomic modification X of M such that A is coherence-ordered before X and X is coherence-ordered before B.

There is a single total order S on all SeqCst operations, including fences, that satisfies the following constraints. First, if A and B are SeqCst operations and A strongly happens before B, then A precedes B in S. Second, for every pair of atomic operations A and B on an object M, where A is coherence-ordered before B, the following four conditions are required to be satisfied by S:

  • if A and B are both SeqCst operations, then A precedes B in S; and
  • if A is a SeqCst operation and B happens before a SeqCst fence Y, then A precedes Y in S; and
  • if a SeqCst fence X happens before A and B is a SeqCst operation, then X precedes B in S; and
  • if an SeqCst fence X happens before A and B happens before a SeqCst fence Y, then X precedes Y in S.

Note 3: This definition ensures that S is consistent with the modification order of any atomic object M. It also ensures that a SeqCst load A of M gets its value either from the last modification of M that precedes A in S or from some non-SeqCst modification of M that does not happen before any modification of M that precedes A in S.

Note 4: We do not require that S be consistent with “happens before”. This allows more efficient implementation of Acquire and Release on some machine architectures. It can produce surprising results when these are mixed with SeqCst accesses.

Note 5: SeqCst ensures sequential consistency only for a program that is free of data races and uses exclusively SeqCst atomic operations. Any use of weaker ordering will invalidate this guarantee unless extreme care is used. In many cases, SeqCst atomic operations are reorderable with respect to other atomic operations performed by the same thread.

Implementations should ensure that no “out-of-thin-air” values are computed that circularly depend on their own computation.

Note 6: For example, with x and y initially zero,

// Thread 1:
let r1 = y.load(atomic::Ordering::Relaxed);
x.store(r1, atomic::Ordering::Relaxed);
// Thread 2:
let r2 = x.load(atomic::Ordering::Relaxed);
y.store(r2, atomic::Ordering::Relaxed);

this recommendation discourages producing r1 == r2 == 42, since the store of 42 to y is only possible if the store to x stores 42, which circularly depends on the store to y storing 42. Note that without this restriction, such an execution is possible.

Note 7: The recommendation similarly disallows r1 == r2 == 42 in the following example, with x and y again initially zero:

// Thread 1:
let r1 = x.load(atomic::Ordering::Relaxed);
if r1 == 42 {
    y.store(42, atomic::Ordering::Relaxed);
}
// Thread 2:
let r2 = y.load(atomic::Ordering::Relaxed);
if r2 == 42 {
    x.store(42, atomic::Ordering::Relaxed);
}

Atomic read-modify-write operations shall always read the last value (in the modification order) written before the write associated with the read-modify-write operation.

Implementations should make atomic stores visible to atomic loads within a reasonable amount of time.

Atomic fences

This subclause introduces synchronization primitives called fences. Fences can have acquire semantics, release semantics, or both. A fence with acquire semantics is called an acquire fence. A fence with release semantics is called a release fence.

A release fence A synchronizes with an acquire fence B if there exist atomic operations X and Y, both operating on some atomic object M, such that A is sequenced before X, X modifies M, Y is sequenced before B, and Y reads the value written by X or a value written by any side effect in the hypothetical release sequence X would head if it were a release operation.

A release fence A synchronizes with an atomic operation B that performs an acquire operation on an atomic object M if there exists an atomic operation X such that A is sequenced before X, X modifies M, and B reads the value written by X or a value written by any side effect in the hypothetical release sequence X would head if it were a release operation.

An atomic operation A that is a release operation on an atomic object M synchronizes with an acquire fence B if there exists some atomic operation X on M such that X is sequenced before B and reads the value written by A or a value written by any side effect in the release sequence headed by A.

pub fn fence(order: Ordering);

Effects: Depending on the value of order, this operation:

  • has no effects, if order == Relaxed;
  • is an acquire fence, if order == Acquire;
  • is a release fence, if order == Release;
  • is both an acquire and a release fence, if order == AcqRel;
  • is a sequentially consistent acquire and release fence, if order == SeqCst.
pub fn compiler_fence(order: Ordering);

Effects: Equivalent to fence(order), except that the resulting ordering constraints are established only between a thread and a signal handler executed in the same thread.

Note 1: compiler_fence can be used to specify the order in which actions performed by the thread become visible to the signal handler. Compiler optimizations and reorderings of loads and stores are inhibited in the same way as with fence but the hardware fence instructions that fence would have inserted are not emitted.