From 544283f8b9d8bf50dd66b3e78b32d16386e457cd Mon Sep 17 00:00:00 2001
From: ThePuzzlemaker <tpzker@thepuzzlemaker.info>
Date: Sat, 18 Nov 2023 20:39:31 -0600
Subject: [PATCH] Improve wording and description of atomics in the `Arc`
 chapter

---
 src/arc-mutex/arc-clone.md  | 20 +++++++------
 src/arc-mutex/arc-drop.md   | 56 ++++++++++++++++---------------------
 src/arc-mutex/arc-layout.md | 40 ++++++++++++++++----------
 3 files changed, 61 insertions(+), 55 deletions(-)
diff --git a/src/arc-mutex/arc-clone.md b/src/arc-mutex/arc-clone.md
index 1adc6c9..ea9a981 100644
--- a/src/arc-mutex/arc-clone.md
+++ b/src/arc-mutex/arc-clone.md
@@ -21,14 +21,18 @@ We can update the atomic reference count as follows:
 let old_rc = inner.rc.fetch_add(1, Ordering::???);
 ```
 
-But what ordering should we use here? We don't really have any code that will
-need atomic synchronization when cloning, as we do not modify the internal value
-while cloning. Thus, we can use a Relaxed ordering here, which implies no
-happens-before relationship but is atomic. When `Drop`ping the Arc, however,
-we'll need to atomically synchronize when decrementing the reference count. This
-is described more in [the section on the `Drop` implementation for
-`Arc`](arc-drop.md). For more information on atomic relationships and Relaxed
-ordering, see [the section on atomics](../atomics.md).
+But what ordering should we use here? We don't really have any code
+that will need atomic synchronization when cloning, as we do not
+modify the internal value while cloning. Additionally, we already know
+the reference count is at least one, by virtue of having a
+`&Arc<T>`--and it will stay that way in sound code as long as that
+reference still lives. Thus, we can use a Relaxed ordering here, which
+implies no happens-before relationship but is atomic. When `Drop`ping
+the Arc, however, we'll need to atomically synchronize when
+decrementing the reference count. This is described more in [the
+section on the `Drop` implementation for `Arc`](arc-drop.md). For more
+information on atomic relationships and Relaxed ordering, see [the
+section on atomics](../atomics.md).
 
 Thus, the code becomes this:
 
diff --git a/src/arc-mutex/arc-drop.md b/src/arc-mutex/arc-drop.md
index 3dd9f03..65dd8f0 100644
--- a/src/arc-mutex/arc-drop.md
+++ b/src/arc-mutex/arc-drop.md
@@ -32,45 +32,37 @@ if inner.rc.fetch_sub(1, Ordering::Release) != 1 {
 }
 ```
 
-We then need to create an atomic fence to prevent reordering of the use of the
-data and deletion of the data. As described in [the standard library's
-implementation of `Arc`][3]:
-> This fence is needed to prevent reordering of use of the data and deletion of
-> the data. Because it is marked `Release`, the decreasing of the reference
-> count synchronizes with this `Acquire` fence. This means that use of the data
-> happens before decreasing the reference count, which happens before this
-> fence, which happens before the deletion of the data.
->
-> As explained in the [Boost documentation][1],
->
-> > It is important to enforce any possible access to the object in one
-> > thread (through an existing reference) to *happen before* deleting
-> > the object in a different thread. This is achieved by a "release"
-> > operation after dropping a reference (any access to the object
-> > through this reference must obviously happened before), and an
-> > "acquire" operation before deleting the object.
->
-> In particular, while the contents of an Arc are usually immutable, it's
-> possible to have interior writes to something like a Mutex<T>. Since a Mutex
-> is not acquired when it is deleted, we can't rely on its synchronization logic
-> to make writes in thread A visible to a destructor running in thread B.
->
-> Also note that the Acquire fence here could probably be replaced with an
-> Acquire load, which could improve performance in highly-contended situations.
-> See [2].
->
-> [1]: https://www.boost.org/doc/libs/1_55_0/doc/html/atomic/usage_examples.html
-> [2]: https://github.com/rust-lang/rust/pull/41714
-[3]: https://github.com/rust-lang/rust/blob/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/alloc/src/sync.rs#L1440-L1467
+Why do we use `Release` here? Well, the `Release` ordering ensures
+that any writes to the data from other threads happen-before this
+decrementing of the reference count.
+
+If we succeed, however, we need further guarantees. We must use an
+`Acquire` fence, which ensures that the decrement of the reference
+count happens-before our deletion of the data. 
 
 To do this, we do the following:
 
-```rust
-# use std::sync::atomic::Ordering;
+<!-- ignore: simplified code -->
+```rust,ignore
 use std::sync::atomic;
 atomic::fence(Ordering::Acquire);
 ```
 
+We could have used `AcqRel` for the `fetch_sub` operation, but that
+would give more guarantees than we need when we *don't* succeed. On
+some platforms, using an `AcqRel` ordering for *every* `Drop` may have
+an impact on performance. While this is a niche optimization, it can't
+hurt--also, it helps to further convey the guarantees necessary not
+only to the processor but also to readers of the code.
+
+With the combination of these two synchronization points, we ensure
+that, to our thread, the following order of events manifests:
+- Data used by our/other thread
+- Reference count decremented by our thread
+- Data deleted by our thread
+This way, we ensure that our data is not dropped while it is still
+in use.
+
 Finally, we can drop the data itself. We use `Box::from_raw` to drop the boxed
 `ArcInner<T>` and its data. This takes a `*mut T` and not a `NonNull<T>`, so we
 must convert using `NonNull::as_ptr`.
diff --git a/src/arc-mutex/arc-layout.md b/src/arc-mutex/arc-layout.md
index fabfdca..5e86543 100644
--- a/src/arc-mutex/arc-layout.md
+++ b/src/arc-mutex/arc-layout.md
@@ -26,7 +26,7 @@ Naively, it would look something like this:
 use std::sync::atomic;
 
 pub struct Arc<T> {
-    ptr: *mut ArcInner<T>,
+    ptr: *const ArcInner<T>,
 }
 
 pub struct ArcInner<T> {
@@ -35,24 +35,34 @@ pub struct ArcInner<T> {
 }
 ```
 
-This would compile, however it would be incorrect. First of all, the compiler
-will give us too strict variance. For example, an `Arc<&'static str>` couldn't
-be used where an `Arc<&'a str>` was expected. More importantly, it will give
-incorrect ownership information to the drop checker, as it will assume we don't
-own any values of type `T`. As this is a structure providing shared ownership of
-a value, at some point there will be an instance of this structure that entirely
-owns its data. See [the chapter on ownership and lifetimes](../ownership.md) for
-all the details on variance and drop check.
+This would compile, however it would be incorrect--it will give
+incorrect ownership information to the drop checker, as it will assume
+we don't own any values of type `T`. As this is a structure providing
+shared ownership of a value, at some point there will be an instance
+of this structure that entirely owns its data.
 
-To fix the first problem, we can use `NonNull<T>`. Note that `NonNull<T>` is a
-wrapper around a raw pointer that declares that:
+To fix the problem, we can include a `PhantomData` marker containing an
+`ArcInner<T>`. This will tell the drop checker that we have some notion of
+ownership of a value of `ArcInner<T>` (which itself contains some `T`).
+
+We should also use `NonNull<T>`, as it helps convey to the reader, and
+the compiler, more guarantees about our inner pointer. Note that
+`NonNull<T>` is a wrapper around a raw pointer that declares that:
 
-* We are covariant over `T`
+* We are covariant over `T`. This property is important to retain from
+  a `*const T` so that, for example, we could use an `Arc<&'static T>`
+  where an `Arc<&'a T>` was needed. This is perhaps a bit contrived but
+  there are cases when this could be useful, especially when dealing
+  with structures generic over lifetimes.
 * Our pointer is never null
 
-To fix the second problem, we can include a `PhantomData` marker containing an
-`ArcInner<T>`. This will tell the drop checker that we have some notion of
-ownership of a value of `ArcInner<T>` (which itself contains some `T`).
+For more information on variance and the drop check, see [the chapter
+on ownership and lifetimes](../ownership.md).
+
+This can lead to some helpful compiler optimizations for layout (for
+example, the null pointer optimization for `Option<NonNull<T>>` would
+carry forth to an `Option<Arc<T>>` and perhaps even machine code
+optimizations in certain cases.
 
 With these changes we get our final structure: