From 544283f8b9d8bf50dd66b3e78b32d16386e457cd Mon Sep 17 00:00:00 2001 From: ThePuzzlemaker Date: Sat, 18 Nov 2023 20:39:31 -0600 Subject: [PATCH] Improve wording and description of atomics in the `Arc` chapter --- src/arc-mutex/arc-clone.md | 20 +++++++------ src/arc-mutex/arc-drop.md | 56 ++++++++++++++++--------------------- src/arc-mutex/arc-layout.md | 40 ++++++++++++++++---------- 3 files changed, 61 insertions(+), 55 deletions(-) diff --git a/src/arc-mutex/arc-clone.md b/src/arc-mutex/arc-clone.md index 1adc6c9..ea9a981 100644 --- a/src/arc-mutex/arc-clone.md +++ b/src/arc-mutex/arc-clone.md @@ -21,14 +21,18 @@ We can update the atomic reference count as follows: let old_rc = inner.rc.fetch_add(1, Ordering::???); ``` -But what ordering should we use here? We don't really have any code that will -need atomic synchronization when cloning, as we do not modify the internal value -while cloning. Thus, we can use a Relaxed ordering here, which implies no -happens-before relationship but is atomic. When `Drop`ping the Arc, however, -we'll need to atomically synchronize when decrementing the reference count. This -is described more in [the section on the `Drop` implementation for -`Arc`](arc-drop.md). For more information on atomic relationships and Relaxed -ordering, see [the section on atomics](../atomics.md). +But what ordering should we use here? We don't really have any code +that will need atomic synchronization when cloning, as we do not +modify the internal value while cloning. Additionally, we already know +the reference count is at least one, by virtue of having a +`&Arc`--and it will stay that way in sound code as long as that +reference still lives. Thus, we can use a Relaxed ordering here, which +implies no happens-before relationship but is atomic. When `Drop`ping +the Arc, however, we'll need to atomically synchronize when +decrementing the reference count. This is described more in [the +section on the `Drop` implementation for `Arc`](arc-drop.md). For more +information on atomic relationships and Relaxed ordering, see [the +section on atomics](../atomics.md). Thus, the code becomes this: diff --git a/src/arc-mutex/arc-drop.md b/src/arc-mutex/arc-drop.md index 3dd9f03..65dd8f0 100644 --- a/src/arc-mutex/arc-drop.md +++ b/src/arc-mutex/arc-drop.md @@ -32,45 +32,37 @@ if inner.rc.fetch_sub(1, Ordering::Release) != 1 { } ``` -We then need to create an atomic fence to prevent reordering of the use of the -data and deletion of the data. As described in [the standard library's -implementation of `Arc`][3]: -> This fence is needed to prevent reordering of use of the data and deletion of -> the data. Because it is marked `Release`, the decreasing of the reference -> count synchronizes with this `Acquire` fence. This means that use of the data -> happens before decreasing the reference count, which happens before this -> fence, which happens before the deletion of the data. -> -> As explained in the [Boost documentation][1], -> -> > It is important to enforce any possible access to the object in one -> > thread (through an existing reference) to *happen before* deleting -> > the object in a different thread. This is achieved by a "release" -> > operation after dropping a reference (any access to the object -> > through this reference must obviously happened before), and an -> > "acquire" operation before deleting the object. -> -> In particular, while the contents of an Arc are usually immutable, it's -> possible to have interior writes to something like a Mutex. Since a Mutex -> is not acquired when it is deleted, we can't rely on its synchronization logic -> to make writes in thread A visible to a destructor running in thread B. -> -> Also note that the Acquire fence here could probably be replaced with an -> Acquire load, which could improve performance in highly-contended situations. -> See [2]. -> -> [1]: https://www.boost.org/doc/libs/1_55_0/doc/html/atomic/usage_examples.html -> [2]: https://github.com/rust-lang/rust/pull/41714 -[3]: https://github.com/rust-lang/rust/blob/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/alloc/src/sync.rs#L1440-L1467 +Why do we use `Release` here? Well, the `Release` ordering ensures +that any writes to the data from other threads happen-before this +decrementing of the reference count. + +If we succeed, however, we need further guarantees. We must use an +`Acquire` fence, which ensures that the decrement of the reference +count happens-before our deletion of the data. To do this, we do the following: -```rust -# use std::sync::atomic::Ordering; + +```rust,ignore use std::sync::atomic; atomic::fence(Ordering::Acquire); ``` +We could have used `AcqRel` for the `fetch_sub` operation, but that +would give more guarantees than we need when we *don't* succeed. On +some platforms, using an `AcqRel` ordering for *every* `Drop` may have +an impact on performance. While this is a niche optimization, it can't +hurt--also, it helps to further convey the guarantees necessary not +only to the processor but also to readers of the code. + +With the combination of these two synchronization points, we ensure +that, to our thread, the following order of events manifests: +- Data used by our/other thread +- Reference count decremented by our thread +- Data deleted by our thread +This way, we ensure that our data is not dropped while it is still +in use. + Finally, we can drop the data itself. We use `Box::from_raw` to drop the boxed `ArcInner` and its data. This takes a `*mut T` and not a `NonNull`, so we must convert using `NonNull::as_ptr`. diff --git a/src/arc-mutex/arc-layout.md b/src/arc-mutex/arc-layout.md index fabfdca..5e86543 100644 --- a/src/arc-mutex/arc-layout.md +++ b/src/arc-mutex/arc-layout.md @@ -26,7 +26,7 @@ Naively, it would look something like this: use std::sync::atomic; pub struct Arc { - ptr: *mut ArcInner, + ptr: *const ArcInner, } pub struct ArcInner { @@ -35,24 +35,34 @@ pub struct ArcInner { } ``` -This would compile, however it would be incorrect. First of all, the compiler -will give us too strict variance. For example, an `Arc<&'static str>` couldn't -be used where an `Arc<&'a str>` was expected. More importantly, it will give -incorrect ownership information to the drop checker, as it will assume we don't -own any values of type `T`. As this is a structure providing shared ownership of -a value, at some point there will be an instance of this structure that entirely -owns its data. See [the chapter on ownership and lifetimes](../ownership.md) for -all the details on variance and drop check. +This would compile, however it would be incorrect--it will give +incorrect ownership information to the drop checker, as it will assume +we don't own any values of type `T`. As this is a structure providing +shared ownership of a value, at some point there will be an instance +of this structure that entirely owns its data. -To fix the first problem, we can use `NonNull`. Note that `NonNull` is a -wrapper around a raw pointer that declares that: +To fix the problem, we can include a `PhantomData` marker containing an +`ArcInner`. This will tell the drop checker that we have some notion of +ownership of a value of `ArcInner` (which itself contains some `T`). + +We should also use `NonNull`, as it helps convey to the reader, and +the compiler, more guarantees about our inner pointer. Note that +`NonNull` is a wrapper around a raw pointer that declares that: -* We are covariant over `T` +* We are covariant over `T`. This property is important to retain from + a `*const T` so that, for example, we could use an `Arc<&'static T>` + where an `Arc<&'a T>` was needed. This is perhaps a bit contrived but + there are cases when this could be useful, especially when dealing + with structures generic over lifetimes. * Our pointer is never null -To fix the second problem, we can include a `PhantomData` marker containing an -`ArcInner`. This will tell the drop checker that we have some notion of -ownership of a value of `ArcInner` (which itself contains some `T`). +For more information on variance and the drop check, see [the chapter +on ownership and lifetimes](../ownership.md). + +This can lead to some helpful compiler optimizations for layout (for +example, the null pointer optimization for `Option>` would +carry forth to an `Option>` and perhaps even machine code +optimizations in certain cases. With these changes we get our final structure: