Merge branch 'master' into atomics

pull/378/head
SabrinaJewson 1 year ago
commit 19b059a6eb
No known key found for this signature in database
GPG Key ID: 3D5438FFA5F05564

@ -35,4 +35,4 @@ git-repository-url = "https://github.com/rust-lang/nomicon"
"./atomics.html" = "./atomics/atomics.html" "./atomics.html" = "./atomics/atomics.html"
[rust] [rust]
edition = "2018" edition = "2021"

@ -1,10 +1,108 @@
# Beneath std # Beneath `std`
This section documents (or will document) features that are provided by the standard library and This section documents features that are normally provided by the `std` crate and
that `#![no_std]` developers have to deal with (i.e. provide) to build `#![no_std]` binary crates. A that `#![no_std]` developers have to deal with (i.e. provide) to build
(likely incomplete) list of such features is shown below: `#![no_std]` binary crates.
- `#[lang = "eh_personality"]` ## Using `libc`
- `#[lang = "start"]`
- `#[lang = "termination"]` In order to build a `#[no_std]` executable we will need `libc` as a dependency.
- `#[panic_implementation]` We can specify this using our `Cargo.toml` file:
```toml
[dependencies]
libc = { version = "0.2.146", default-features = false }
```
Note that the default features have been disabled. This is a critical step -
**the default features of `libc` include the `std` crate and so must be
disabled.**
Alternatively, we can use the unstable `rustc_private` private feature together
with an `extern crate libc;` declaration as shown in the examples below.
## Writing an executable without `std`
We will probably need a nightly version of the compiler to produce
a `#![no_std]` executable because on many platforms, we have to provide the
`eh_personality` [lang item], which is unstable.
Controlling the entry point is possible in two ways: the `#[start]` attribute,
or overriding the default shim for the C `main` function with your own.
Additionally, it's required to define a [panic handler function](panic-handler.html).
The function marked `#[start]` is passed the command line parameters
in the same format as C (aside from the exact integer types being used):
```rust
#![feature(start, lang_items, core_intrinsics, rustc_private)]
#![allow(internal_features)]
#![no_std]
// Necessary for `panic = "unwind"` builds on some platforms.
#![feature(panic_unwind)]
extern crate unwind;
// Pull in the system libc library for what crt0.o likely requires.
extern crate libc;
use core::panic::PanicInfo;
// Entry point for this program.
#[start]
fn main(_argc: isize, _argv: *const *const u8) -> isize {
0
}
// These functions are used by the compiler, but not for an empty program like this.
// They are normally provided by `std`.
#[lang = "eh_personality"]
fn rust_eh_personality() {}
#[panic_handler]
fn panic_handler(_info: &PanicInfo) -> ! { core::intrinsics::abort() }
```
To override the compiler-inserted `main` shim, we have to disable it
with `#![no_main]` and then create the appropriate symbol with the
correct ABI and the correct name, which requires overriding the
compiler's name mangling too:
```rust
#![feature(lang_items, core_intrinsics, rustc_private)]
#![allow(internal_features)]
#![no_std]
#![no_main]
// Necessary for `panic = "unwind"` builds on some platforms.
#![feature(panic_unwind)]
extern crate unwind;
// Pull in the system libc library for what crt0.o likely requires.
extern crate libc;
use core::ffi::{c_char, c_int};
use core::panic::PanicInfo;
// Entry point for this program.
#[no_mangle] // ensure that this symbol is included in the output as `main`
extern "C" fn main(_argc: c_int, _argv: *const *const c_char) -> c_int {
0
}
// These functions are used by the compiler, but not for an empty program like this.
// They are normally provided by `std`.
#[lang = "eh_personality"]
fn rust_eh_personality() {}
#[panic_handler]
fn panic_handler(_info: &PanicInfo) -> ! { core::intrinsics::abort() }
```
If you are working with a target that doesn't have binary releases of the
standard library available via rustup (this probably means you are building the
`core` crate yourself) and need compiler-rt intrinsics (i.e. you are probably
getting linker errors when building an executable:
``undefined reference to `__aeabi_memcpy'``), you need to manually link to the
[`compiler_builtins` crate] to get those intrinsics and solve the linker errors.
[`compiler_builtins` crate]: https://crates.io/crates/compiler_builtins
[lang item]: https://doc.rust-lang.org/nightly/unstable-book/language-features/lang-items.html

@ -159,7 +159,7 @@ impl<'a, T> Iterator for IterMut<'a, T> {
type Item = &'a mut T; type Item = &'a mut T;
fn next(&mut self) -> Option<Self::Item> { fn next(&mut self) -> Option<Self::Item> {
let slice = mem::replace(&mut self.0, &mut []); let slice = mem::take(&mut self.0);
if slice.is_empty() { return None; } if slice.is_empty() { return None; }
let (l, r) = slice.split_at_mut(1); let (l, r) = slice.split_at_mut(1);
@ -170,7 +170,7 @@ impl<'a, T> Iterator for IterMut<'a, T> {
impl<'a, T> DoubleEndedIterator for IterMut<'a, T> { impl<'a, T> DoubleEndedIterator for IterMut<'a, T> {
fn next_back(&mut self) -> Option<Self::Item> { fn next_back(&mut self) -> Option<Self::Item> {
let slice = mem::replace(&mut self.0, &mut []); let slice = mem::take(&mut self.0);
if slice.is_empty() { return None; } if slice.is_empty() { return None; }
let new_len = slice.len() - 1; let new_len = slice.len() - 1;

@ -250,7 +250,7 @@ fn main() {
inspector: None, inspector: None,
days: Box::new(1), days: Box::new(1),
}; };
world.inspector = Some(Inspector(&world.days, "gatget")); world.inspector = Some(Inspector(&world.days, "gadget"));
} }
``` ```

@ -161,9 +161,9 @@ impl<'a, T> Hole<'a, T> {
unsafe { unsafe {
let elt = ptr::read(&data[pos]); let elt = ptr::read(&data[pos]);
Hole { Hole {
data: data, data,
elt: Some(elt), elt: Some(elt),
pos: pos, pos,
} }
} }
} }
@ -172,7 +172,7 @@ impl<'a, T> Hole<'a, T> {
fn removed(&self) -> &T { self.elt.as_ref().unwrap() } fn removed(&self) -> &T { self.elt.as_ref().unwrap() }
unsafe fn get(&self, index: usize) -> &T { &self.data[index] } fn get(&self, index: usize) -> &T { &self.data[index] }
unsafe fn move_to(&mut self, index: usize) { unsafe fn move_to(&mut self, index: usize) {
let index_ptr: *const _ = &self.data[index]; let index_ptr: *const _ = &self.data[index];

@ -258,7 +258,7 @@ pub extern "C" fn hello_from_rust() {
# fn main() {} # fn main() {}
``` ```
The `extern "C"` makes this function adhere to the C calling convention, as discussed above in "[Foreign Calling Conventions]". The `extern "C"` makes this function adhere to the C calling convention, as discussed below in "[Foreign Calling Conventions]".
The `no_mangle` attribute turns off Rust's name mangling, so that it has a well defined symbol to link to. The `no_mangle` attribute turns off Rust's name mangling, so that it has a well defined symbol to link to.
Then, to compile Rust code as a shared library that can be called from C, add the following to your `Cargo.toml`: Then, to compile Rust code as a shared library that can be called from C, add the following to your `Cargo.toml`:
@ -586,6 +586,7 @@ are:
* `aapcs` * `aapcs`
* `cdecl` * `cdecl`
* `fastcall` * `fastcall`
* `thiscall`
* `vectorcall` * `vectorcall`
This is currently hidden behind the `abi_vectorcall` gate and is subject to change. This is currently hidden behind the `abi_vectorcall` gate and is subject to change.
* `Rust` * `Rust`
@ -659,7 +660,8 @@ Certain Rust types are defined to never be `null`. This includes references (`&T
`&mut T`), boxes (`Box<T>`), and function pointers (`extern "abi" fn()`). When `&mut T`), boxes (`Box<T>`), and function pointers (`extern "abi" fn()`). When
interfacing with C, pointers that might be `null` are often used, which would seem to interfacing with C, pointers that might be `null` are often used, which would seem to
require some messy `transmute`s and/or unsafe code to handle conversions to/from Rust types. require some messy `transmute`s and/or unsafe code to handle conversions to/from Rust types.
However, the language provides a workaround. However, trying to construct/work with these invalid values **is undefined behavior**,
so you should use the following workaround instead.
As a special case, an `enum` is eligible for the "nullable pointer optimization" if it contains As a special case, an `enum` is eligible for the "nullable pointer optimization" if it contains
exactly two variants, one of which contains no data and the other contains a field of one of the exactly two variants, one of which contains no data and the other contains a field of one of the
@ -720,17 +722,20 @@ No `transmute` required!
## FFI and unwinding ## FFI and unwinding
Its important to be mindful of unwinding when working with FFI. Each Its important to be mindful of unwinding when working with FFI. Most
non-`Rust` ABI comes in two variants, one with `-unwind` suffix and one without. If ABI strings come in two variants, one with an `-unwind` suffix and one without.
you expect Rust `panic`s or foreign (e.g. C++) exceptions to cross an FFI The `Rust` ABI always permits unwinding, so there is no `Rust-unwind` ABI.
boundary, that boundary must use the appropriate `-unwind` ABI string (note
that compiling with `panic=abort` will still cause `panic!` to immediately
abort the process, regardless of which ABI is specified by the function that
`panic`s).
If you expect Rust `panic`s or foreign (e.g. C++) exceptions to cross an FFI
boundary, that boundary must use the appropriate `-unwind` ABI string.
Conversely, if you do not expect unwinding to cross an ABI boundary, use one of Conversely, if you do not expect unwinding to cross an ABI boundary, use one of
the non-`unwind` ABI strings (other than `Rust`, which always permits the non-`unwind` ABI strings.
unwinding). If an unwinding operation does encounter an ABI boundary that is
> Note: Compiling with `panic=abort` will still cause `panic!` to immediately
abort the process, regardless of which ABI is specified by the function that
`panic`s.
If an unwinding operation does encounter an ABI boundary that is
not permitted to unwind, the behavior depends on the source of the unwinding not permitted to unwind, the behavior depends on the source of the unwinding
(Rust `panic` or a foreign exception): (Rust `panic` or a foreign exception):

@ -39,7 +39,7 @@ Topics that are within the scope of this book include: the meaning of (un)safety
The Rustonomicon is not a place to exhaustively describe the semantics and guarantees of every single API in the standard library, nor is it a place to exhaustively describe every feature of Rust. The Rustonomicon is not a place to exhaustively describe the semantics and guarantees of every single API in the standard library, nor is it a place to exhaustively describe every feature of Rust.
Unless otherwise noted, Rust code in this book uses the Rust 2018 edition. Unless otherwise noted, Rust code in this book uses the Rust 2021 edition.
[trpl]: ../book/index.html [trpl]: ../book/index.html
[ref]: ../reference/index.html [ref]: ../reference/index.html

@ -134,10 +134,10 @@ impl<T> Rc<T> {
// Wouldn't it be nice if heap::allocate worked like this? // Wouldn't it be nice if heap::allocate worked like this?
let ptr = heap::allocate::<RcBox<T>>(); let ptr = heap::allocate::<RcBox<T>>();
ptr::write(ptr, RcBox { ptr::write(ptr, RcBox {
data: data, data,
ref_count: 1, ref_count: 1,
}); });
Rc { ptr: ptr } Rc { ptr }
} }
} }
@ -194,7 +194,7 @@ pub fn scoped<'a, F>(f: F) -> JoinGuard<'a>
``` ```
Here `f` is some closure for the other thread to execute. Saying that Here `f` is some closure for the other thread to execute. Saying that
`F: Send +'a` is saying that it closes over data that lives for `'a`, and it `F: Send + 'a` is saying that it closes over data that lives for `'a`, and it
either owns that data or the data was Sync (implying `&data` is Send). either owns that data or the data was Sync (implying `&data` is Send).
Because JoinGuard has a lifetime, it keeps all the data it closes over Because JoinGuard has a lifetime, it keeps all the data it closes over

@ -65,7 +65,7 @@ fn main() {
The lifetime system is forced to extend the `&mut foo` to have lifetime `'c`, The lifetime system is forced to extend the `&mut foo` to have lifetime `'c`,
due to the lifetime of `loan` and `mutate_and_share`'s signature. Then when we due to the lifetime of `loan` and `mutate_and_share`'s signature. Then when we
try to call `share`, and it sees we're trying to alias that `&'c mut foo` and try to call `share`, it sees we're trying to alias that `&'c mut foo` and
blows up in our face! blows up in our face!
This program is clearly correct according to the reference semantics we actually This program is clearly correct according to the reference semantics we actually
@ -74,9 +74,9 @@ care about, but the lifetime system is too coarse-grained to handle that.
## Improperly reduced borrows ## Improperly reduced borrows
The following code fails to compile, because Rust sees that a variable, `map`, The following code fails to compile, because Rust sees that a variable, `map`,
is borrowed twice, and can not infer that the first borrow stops to be needed is borrowed twice, and can not infer that the first borrow ceases to be needed
before the second one occurs. This is caused by Rust conservatively falling back before the second one occurs. This is caused by Rust conservatively falling back
to using a whole scope for the first borow. This will eventually get fixed. to using a whole scope for the first borrow. This will eventually get fixed.
```rust,compile_fail ```rust,compile_fail
# use std::collections::HashMap; # use std::collections::HashMap;

@ -55,7 +55,7 @@ likely desugar to the following:
let y: &'b i32 = &'b x; let y: &'b i32 = &'b x;
'c: { 'c: {
// ditto on 'c // ditto on 'c
let z: &'c &'b i32 = &'c y; let z: &'c &'b i32 = &'c y; // "a reference to a reference to an i32" (with lifetimes annotated)
} }
} }
} }

@ -56,24 +56,26 @@ compiled as normal.)
## repr(transparent) ## repr(transparent)
This can only be used on structs with a single non-zero-sized field (there may `#[repr(transparent)]` can only be used on a struct or single-variant enum that has a single non-zero-sized field (there may be additional zero-sized fields).
be additional zero-sized fields). The effect is that the layout and ABI of the The effect is that the layout and ABI of the whole struct/enum is guaranteed to be the same as that one field.
whole struct is guaranteed to be the same as that one field.
> NOTE: There's a `transparent_unions` nightly feature to apply `repr(transparent)` to unions,
> but it hasn't been stabilized due to design concerns. See the [tracking issue][issue-60405] for more details.
The goal is to make it possible to transmute between the single field and the The goal is to make it possible to transmute between the single field and the
struct. An example of that is [`UnsafeCell`], which can be transmuted into struct/enum. An example of that is [`UnsafeCell`], which can be transmuted into
the type it wraps ([`UnsafeCell`] also uses the unstable [no_niche][no-niche-pull], the type it wraps ([`UnsafeCell`] also uses the unstable [no_niche][no-niche-pull],
so its ABI is not actually guaranteed to be the same when nested in other types). so its ABI is not actually guaranteed to be the same when nested in other types).
Also, passing the struct through FFI where the inner field type is expected on Also, passing the struct/enum through FFI where the inner field type is expected on
the other side is guaranteed to work. In particular, this is necessary for `struct the other side is guaranteed to work. In particular, this is necessary for
Foo(f32)` to always have the same ABI as `f32`. `struct Foo(f32)` or `enum Foo { Bar(f32) }` to always have the same ABI as `f32`.
This repr is only considered part of the public ABI of a type if either the single This repr is only considered part of the public ABI of a type if either the single
field is `pub`, or if its layout is documented in prose. Otherwise, the layout should field is `pub`, or if its layout is documented in prose. Otherwise, the layout should
not be relied upon by other crates. not be relied upon by other crates.
More details are in the [RFC][rfc-transparent]. More details are in the [RFC 1758][rfc-transparent] and the [RFC 2645][rfc-transparent-unions-enums].
## repr(u*), repr(i*) ## repr(u*), repr(i*)
@ -153,8 +155,10 @@ This is a modifier on `repr(C)` and `repr(Rust)`. It is incompatible with
[unsafe code guidelines]: https://rust-lang.github.io/unsafe-code-guidelines/layout.html [unsafe code guidelines]: https://rust-lang.github.io/unsafe-code-guidelines/layout.html
[drop flags]: drop-flags.html [drop flags]: drop-flags.html
[ub loads]: https://github.com/rust-lang/rust/issues/27060 [ub loads]: https://github.com/rust-lang/rust/issues/27060
[issue-60405]: https://github.com/rust-lang/rust/issues/60405
[`UnsafeCell`]: ../std/cell/struct.UnsafeCell.html [`UnsafeCell`]: ../std/cell/struct.UnsafeCell.html
[rfc-transparent]: https://github.com/rust-lang/rfcs/blob/master/text/1758-repr-transparent.md [rfc-transparent]: https://github.com/rust-lang/rfcs/blob/master/text/1758-repr-transparent.md
[rfc-transparent-unions-enums]: https://rust-lang.github.io/rfcs/2645-transparent-unions.html
[really-tagged]: https://github.com/rust-lang/rfcs/blob/master/text/2195-really-tagged-unions.md [really-tagged]: https://github.com/rust-lang/rfcs/blob/master/text/2195-really-tagged-unions.md
[rust-bindgen]: https://rust-lang.github.io/rust-bindgen/ [rust-bindgen]: https://rust-lang.github.io/rust-bindgen/
[cbindgen]: https://github.com/eqrion/cbindgen [cbindgen]: https://github.com/eqrion/cbindgen

@ -24,7 +24,7 @@ We do this using `PhantomData`, which is a special marker type. `PhantomData`
consumes no space, but simulates a field of the given type for the purpose of consumes no space, but simulates a field of the given type for the purpose of
static analysis. This was deemed to be less error-prone than explicitly telling static analysis. This was deemed to be less error-prone than explicitly telling
the type-system the kind of variance that you want, while also providing other the type-system the kind of variance that you want, while also providing other
useful things such as the information needed by drop check. useful things such as auto traits and the information needed by drop check.
Iter logically contains a bunch of `&'a T`s, so this is exactly what we tell Iter logically contains a bunch of `&'a T`s, so this is exactly what we tell
the `PhantomData` to simulate: the `PhantomData` to simulate:
@ -106,7 +106,14 @@ that that `Vec<T>` _owns_ values of type `T` (more precisely: may use values of
in its `Drop` implementation), and Rust will thus not allow them to _dangle_ should a in its `Drop` implementation), and Rust will thus not allow them to _dangle_ should a
`Vec<T>` be dropped. `Vec<T>` be dropped.
**Adding an extra `_owns_T: PhantomData<T>` field is thus _superfluous_ and accomplishes nothing**. When a type already has a `Drop impl`, **adding an extra `_owns_T: PhantomData<T>` field
is thus _superfluous_ and accomplishes nothing**, dropck-wise (it still affects variance
and auto-traits).
- (advanced edge case: if the type containing the `PhantomData` has no `Drop` impl at all,
but still has drop glue (by having _another_ field with drop glue), then the
dropck/`#[may_dangle]` considerations mentioned herein do apply as well: a `PhantomData<T>`
field will then require `T` to be droppable whenever the containing type goes out of scope).
___ ___
@ -234,14 +241,18 @@ standard library made a utility for itself called `Unique<T>` which:
Heres a table of all the wonderful ways `PhantomData` could be used: Heres a table of all the wonderful ways `PhantomData` could be used:
| Phantom type | `'a` | `T` | | Phantom type | variance of `'a` | variance of `T` | `Send`/`Sync`<br/>(or lack thereof) | dangling `'a` or `T` in drop glue<br/>(_e.g._, `#[may_dangle] Drop`) |
|-----------------------------|-----------|---------------------------| |-----------------------------|:----------------:|:-----------------:|:-----------------------------------------:|:------------------------------------------------:|
| `PhantomData<T>` | - | covariant (with drop check) | | `PhantomData<T>` | - | **cov**ariant | inherited | disallowed ("owns `T`") |
| `PhantomData<&'a T>` | covariant | covariant | | `PhantomData<&'a T>` | **cov**ariant | **cov**ariant | `Send + Sync`<br/>requires<br/>`T : Sync` | allowed |
| `PhantomData<&'a mut T>` | covariant | invariant | | `PhantomData<&'a mut T>` | **cov**ariant | **inv**ariant | inherited | allowed |
| `PhantomData<*const T>` | - | covariant | | `PhantomData<*const T>` | - | **cov**ariant | `!Send + !Sync` | allowed |
| `PhantomData<*mut T>` | - | invariant | | `PhantomData<*mut T>` | - | **inv**ariant | `!Send + !Sync` | allowed |
| `PhantomData<fn(T)>` | - | contravariant | | `PhantomData<fn(T)>` | - | **contra**variant | `Send + Sync` | allowed |
| `PhantomData<fn() -> T>` | - | covariant | | `PhantomData<fn() -> T>` | - | **cov**ariant | `Send + Sync` | allowed |
| `PhantomData<fn(T) -> T>` | - | invariant | | `PhantomData<fn(T) -> T>` | - | **inv**ariant | `Send + Sync` | allowed |
| `PhantomData<Cell<&'a ()>>` | invariant | - | | `PhantomData<Cell<&'a ()>>` | **inv**ariant | - | `Send + !Sync` | allowed |
- Note: opting out of the `Unpin` auto-trait requires the dedicated [`PhantomPinned`] type instead.
[`PhantomPinned`]: ../core/marker/struct.PhantomPinned.html

@ -6,26 +6,28 @@ Safe Rust guarantees an absence of data races, which are defined as:
* one or more of them is a write * one or more of them is a write
* one or more of them is unsynchronized * one or more of them is unsynchronized
A data race has Undefined Behavior, and is therefore impossible to perform A data race has Undefined Behavior, and is therefore impossible to perform in
in Safe Rust. Data races are *mostly* prevented through Rust's ownership system: Safe Rust. Data races are *mostly* prevented through Rust's ownership system:
it's impossible to alias a mutable reference, so it's impossible to perform a it's impossible to alias a mutable reference, so it's impossible to perform a
data race. Interior mutability makes this more complicated, which is largely why data race. Interior mutability makes this more complicated, which is largely why
we have the Send and Sync traits (see below). we have the Send and Sync traits (see the next section for more on this).
**However Rust does not prevent general race conditions.** **However Rust does not prevent general race conditions.**
This is pretty fundamentally impossible, and probably honestly undesirable. Your This is mathematically impossible in situations where you do not control the
hardware is racy, your OS is racy, the other programs on your computer are racy, scheduler, which is true for the normal OS environment. If you do control
and the world this all runs in is racy. Any system that could genuinely claim to preemption, it _can be_ possible to prevent general races - this technique is
prevent *all* race conditions would be pretty awful to use, if not just used by frameworks such as [RTIC](https://github.com/rtic-rs/rtic). However,
incorrect. actually having control over scheduling is a very uncommon case.
So it's perfectly "fine" for a Safe Rust program to get deadlocked or do For this reason, it is considered "safe" for Rust to get deadlocked or do
something nonsensical with incorrect synchronization. Obviously such a program something nonsensical with incorrect synchronization: this is known as a general
isn't very good, but Rust can only hold your hand so far. Still, a race race condition or resource race. Obviously such a program isn't very good, but
condition can't violate memory safety in a Rust program on its own. Only in Rust of course cannot prevent all logic errors.
conjunction with some other unsafe code can a race condition actually violate
memory safety. For instance: In any case, a race condition cannot violate memory safety in a Rust program on
its own. Only in conjunction with some other unsafe code can a race condition
actually violate memory safety. For instance, a correct program looks like this:
```rust,no_run ```rust,no_run
use std::thread; use std::thread;
@ -58,6 +60,9 @@ thread::spawn(move || {
println!("{}", data[idx.load(Ordering::SeqCst)]); println!("{}", data[idx.load(Ordering::SeqCst)]);
``` ```
We can cause a data race if we instead do the bound check in advance, and then
unsafely access the data with an unchecked value:
```rust,no_run ```rust,no_run
use std::thread; use std::thread;
use std::sync::atomic::{AtomicUsize, Ordering}; use std::sync::atomic::{AtomicUsize, Ordering};

@ -94,6 +94,7 @@ to the heap.
use std::{ use std::{
mem::{align_of, size_of}, mem::{align_of, size_of},
ptr, ptr,
cmp::max,
}; };
struct Carton<T>(ptr::NonNull<T>); struct Carton<T>(ptr::NonNull<T>);
@ -105,8 +106,8 @@ impl<T> Carton<T> {
let mut memptr: *mut T = ptr::null_mut(); let mut memptr: *mut T = ptr::null_mut();
unsafe { unsafe {
let ret = libc::posix_memalign( let ret = libc::posix_memalign(
(&mut memptr).cast(), (&mut memptr as *mut *mut T).cast(),
align_of::<T>(), max(align_of::<T>(), size_of::<usize>()),
size_of::<T>() size_of::<T>()
); );
assert_eq!(ret, 0, "Failed to allocate or invalid alignment"); assert_eq!(ret, 0, "Failed to allocate or invalid alignment");

@ -1,189 +1,166 @@
# Subtyping and Variance # Subtyping and Variance
Subtyping is a relationship between types that allows statically typed Rust uses lifetimes to track the relationships between borrows and ownership.
languages to be a bit more flexible and permissive. However, a naive implementation of lifetimes would be either too restrictive,
or permit undefined behavior.
Subtyping in Rust is a bit different from subtyping in other languages. This In order to allow flexible usage of lifetimes
makes it harder to give simple examples, which is a problem since subtyping, while also preventing their misuse, Rust uses **subtyping** and **variance**.
and especially variance, is already hard to understand properly. As in,
even compiler writers mess it up all the time.
To keep things simple, this section will consider a small extension to the Let's start with an example.
Rust language that adds a new and simpler subtyping relationship. After
establishing concepts and issues under this simpler system,
we will then relate it back to how subtyping actually occurs in Rust.
So here's our simple extension, *Objective Rust*, featuring three new types:
```rust ```rust
trait Animal { // Note: debug expects two parameters with the *same* lifetime
fn snuggle(&self); fn debug<'a>(a: &'a str, b: &'a str) {
fn eat(&mut self); println!("a = {a:?} b = {b:?}");
}
trait Cat: Animal {
fn meow(&self);
} }
trait Dog: Animal { fn main() {
fn bark(&self); let hello: &'static str = "hello";
{
let world = String::from("world");
let world = &world; // 'world has a shorter lifetime than 'static
debug(hello, world);
}
} }
``` ```
But unlike normal traits, we can use them as concrete and sized types, just like structs. In a conservative implementation of lifetimes, since `hello` and `world` have different lifetimes,
we might see the following error:
Now, say we have a very simple function that takes an Animal, like this:
<!-- ignore: simplified code --> ```text
```rust,ignore error[E0308]: mismatched types
fn love(pet: Animal) { --> src/main.rs:10:16
pet.snuggle(); |
} 10 | debug(hello, world);
| ^
| |
| expected `&'static str`, found struct `&'world str`
``` ```
By default, static types must match *exactly* for a program to compile. As such, This would be rather unfortunate. In this case,
this code won't compile: what we want is to accept any type that lives *at least as long* as `'world`.
Let's try using subtyping with our lifetimes.
<!-- ignore: simplified code --> ## Subtyping
```rust,ignore
let mr_snuggles: Cat = ...;
love(mr_snuggles); // ERROR: expected Animal, found Cat
```
Mr. Snuggles is a Cat, and Cats aren't *exactly* Animals, so we can't love him! 😿 Subtyping is the idea that one type can be used in place of another.
This is annoying because Cats *are* Animals. They support every operation Let's define that `Sub` is a subtype of `Super` (we'll be using the notation `Sub <: Super` throughout this chapter).
an Animal supports, so intuitively `love` shouldn't care if we pass it a `Cat`.
We should be able to just **forget** the non-animal parts of our `Cat`, as they
aren't necessary to love it.
This is exactly the problem that *subtyping* is intended to fix. Because Cats are just What this is suggesting to us is that the set of *requirements* that `Super` defines
Animals **and more**, we say Cat is a *subtype* of Animal (because Cats are a *subset* are completely satisfied by `Sub`. `Sub` may then have more requirements.
of all the Animals). Equivalently, we say that Animal is a *supertype* of Cat.
With subtypes, we can tweak our overly strict static type system
with a simple rule: anywhere a value of type `T` is expected, we will also
accept values that are subtypes of `T`.
Or more concretely: anywhere an Animal is expected, a Cat or Dog will also work. Now, in order to use subtyping with lifetimes, we need to define the requirement of a lifetime:
As we will see throughout the rest of this section, subtyping is a lot more complicated > `'a` defines a region of code.
and subtle than this, but this simple rule is a very good 99% intuition. And unless you
write unsafe code, the compiler will automatically handle all the corner cases for you.
But this is the Rustonomicon. We're writing unsafe code, so we need to understand how Now that we have a defined set of requirements for lifetimes, we can define how they relate to each other:
this stuff really works, and how we can mess it up.
The core problem is that this rule, naively applied, will lead to *meowing Dogs*. That is, > `'long <: 'short` if and only if `'long` defines a region of code that **completely contains** `'short`.
we can convince someone that a Dog is actually a Cat. This completely destroys the fabric
of our static type system, making it worse than useless (and leading to Undefined Behavior).
Here's a simple example of this happening when we apply subtyping in a completely naive `'long` may define a region larger than `'short`, but that still fits our definition.
"find and replace" way.
<!-- ignore: simplified code --> > As we will see throughout the rest of this chapter,
```rust,ignore subtyping is a lot more complicated and subtle than this,
fn evil_feeder(pet: &mut Animal) { but this simple rule is a very good 99% intuition.
let spike: Dog = ...; And unless you write unsafe code, the compiler will automatically handle all the corner cases for you.
> But this is the Rustonomicon. We're writing unsafe code,
so we need to understand how this stuff really works, and how we can mess it up.
// `pet` is an Animal, and Dog is a subtype of Animal, Going back to our example above, we can say that `'static <: 'world`.
// so this should be fine, right..? For now, let's also accept the idea that subtypes of lifetimes can be passed through references
*pet = spike; (more on this in [Variance](#variance)),
_e.g._ `&'static str` is a subtype of `&'world str`, then we can "downgrade" `&'static str` into a `&'world str`.
With that, the example above will compile:
```rust
fn debug<'a>(a: &'a str, b: &'a str) {
println!("a = {a:?} b = {b:?}");
} }
fn main() { fn main() {
let mut mr_snuggles: Cat = ...; let hello: &'static str = "hello";
evil_feeder(&mut mr_snuggles); // Replaces mr_snuggles with a Dog {
mr_snuggles.meow(); // OH NO, MEOWING DOG! let world = String::from("world");
let world = &world; // 'world has a shorter lifetime than 'static
debug(hello, world); // hello silently downgrades from `&'static str` into `&'world str`
}
} }
``` ```
Clearly, we need a more robust system than "find and replace". That system is *variance*, ## Variance
which is a set of rules governing how subtyping should compose. Most importantly, variance
defines situations where subtyping should be disabled.
But before we get into variance, let's take a quick peek at where subtyping actually occurs in
Rust: *lifetimes*!
> NOTE: The typed-ness of lifetimes is a fairly arbitrary construct that some
> disagree with. However it simplifies our analysis to treat lifetimes
> and types uniformly.
Lifetimes are just regions of code, and regions can be partially ordered with the *contains* Above, we glossed over the fact that `'static <: 'b` implied that `&'static T <: &'b T`. This uses a property known as _variance_.
(outlives) relationship. Subtyping on lifetimes is in terms of that relationship: It's not always as simple as this example, though. To understand that, let's try to extend this example a bit:
if `'big: 'small` ("big contains small" or "big outlives small"), then `'big` is a subtype
of `'small`. This is a large source of confusion, because it seems backwards
to many: the bigger region is a *subtype* of the smaller region. But it makes
sense if you consider our Animal example: Cat is an Animal *and more*,
just as `'big` is `'small` *and more*.
Put another way, if someone wants a reference that lives for `'small`, ```rust,compile_fail,E0597
usually what they actually mean is that they want a reference that lives fn assign<T>(input: &mut T, val: T) {
for *at least* `'small`. They don't actually care if the lifetimes match *input = val;
exactly. So it should be ok for us to **forget** that something lives for }
`'big` and only remember that it lives for `'small`.
The meowing dog problem for lifetimes will result in us being able to fn main() {
store a short-lived reference in a place that expects a longer-lived one, let mut hello: &'static str = "hello";
creating a dangling reference and letting us use-after-free. {
let world = String::from("world");
assign(&mut hello, &world);
}
println!("{hello}"); // use after free 😿
}
```
It will be useful to note that `'static`, the forever lifetime, is a subtype of In `assign`, we are setting the `hello` reference to point to `world`.
every lifetime because by definition it outlives everything. We will be using But then `world` goes out of scope, before the later use of `hello` in the println!
this relationship in later examples to keep them as simple as possible.
With all that said, we still have no idea how to actually *use* subtyping of lifetimes, This is a classic use-after-free bug!
because nothing ever has type `'a`. Lifetimes only occur as part of some larger type
like `&'a u32` or `IterMut<'a, u32>`. To apply lifetime subtyping, we need to know
how to compose subtyping. Once again, we need *variance*.
## Variance Our first instinct might be to blame the `assign` impl, but there's really nothing wrong here.
It shouldn't be surprising that we might want to assign a `T` into a `T`.
Variance is where things get a bit complicated. The problem is that we cannot assume that `&mut &'static str` and `&mut &'b str` are compatible.
This means that `&mut &'static str` **cannot** be a *subtype* of `&mut &'b str`,
even if `'static` is a subtype of `'b`.
Variance is a property that *type constructors* have with respect to their Variance is the concept that Rust borrows to define relationships about subtypes through their generic parameters.
arguments. A type constructor in Rust is any generic type with unbound arguments.
For instance `Vec` is a type constructor that takes a type `T` and returns
`Vec<T>`. `&` and `&mut` are type constructors that take two inputs: a
lifetime, and a type to point to.
> NOTE: For convenience we will often refer to `F<T>` as a type constructor just so > NOTE: For convenience we will define a generic type `F<T>` so
> that we can easily talk about `T`. Hopefully this is clear in context. > that we can easily talk about `T`. Hopefully this is clear in context.
A type constructor F's *variance* is how the subtyping of its inputs affects the The type `F`'s *variance* is how the subtyping of its inputs affects the
subtyping of its outputs. There are three kinds of variance in Rust. Given two subtyping of its outputs. There are three kinds of variance in Rust. Given two
types `Sub` and `Super`, where `Sub` is a subtype of `Super`: types `Sub` and `Super`, where `Sub` is a subtype of `Super`:
* `F` is *covariant* if `F<Sub>` is a subtype of `F<Super>` (subtyping "passes through") * `F` is **covariant** if `F<Sub>` is a subtype of `F<Super>` (the subtype property is passed through)
* `F` is *contravariant* if `F<Super>` is a subtype of `F<Sub>` (subtyping is "inverted") * `F` is **contravariant** if `F<Super>` is a subtype of `F<Sub>` (the subtype property is "inverted")
* `F` is *invariant* otherwise (no subtyping relationship exists) * `F` is **invariant** otherwise (no subtyping relationship exists)
If `F` has multiple type parameters, we can talk about the individual variances If we remember from the above examples,
by saying that, for example, `F<T, U>` is covariant over `T` and invariant over `U`. it was ok for us to treat `&'a T` as a subtype of `&'b T` if `'a <: 'b`,
therefore we can say that `&'a T` is *covariant* over `'a`.
It is very useful to keep in mind that covariance is, in practical terms, "the" Also, we saw that it was not ok for us to treat `&mut &'a U` as a subtype of `&mut &'b U`,
variance. Almost all consideration of variance is in terms of whether something therefore we can say that `&mut T` is *invariant* over `T`
should be covariant or invariant. Actually witnessing contravariance is quite difficult
in Rust, though it does in fact exist.
Here is a table of important variances which the rest of this section will be devoted Here is a table of some other generic types and their variances:
to trying to explain:
| | | 'a | T | U | | | 'a | T | U |
|---|-----------------|:---------:|:-----------------:|:---------:| |-----------------|:---------:|:-----------------:|:---------:|
| * | `&'a T ` | covariant | covariant | | | `&'a T ` | covariant | covariant | |
| * | `&'a mut T` | covariant | invariant | | | `&'a mut T` | covariant | invariant | |
| * | `Box<T>` | | covariant | | | `Box<T>` | | covariant | |
| | `Vec<T>` | | covariant | | | `Vec<T>` | | covariant | |
| * | `UnsafeCell<T>` | | invariant | | | `UnsafeCell<T>` | | invariant | |
| | `Cell<T>` | | invariant | | | `Cell<T>` | | invariant | |
| * | `fn(T) -> U` | | **contra**variant | covariant | | `fn(T) -> U` | | **contra**variant | covariant |
| | `*const T` | | covariant | | | `*const T` | | covariant | |
| | `*mut T` | | invariant | | | `*mut T` | | invariant | |
The types with \*'s are the ones we will be focusing on, as they are in Some of these can be explained simply in relation to the others:
some sense "fundamental". All the others can be understood by analogy to the others:
* `Vec<T>` and all other owning pointers and collections follow the same logic as `Box<T>` * `Vec<T>` and all other owning pointers and collections follow the same logic as `Box<T>`
* `Cell<T>` and all other interior mutability types follow the same logic as `UnsafeCell<T>` * `Cell<T>` and all other interior mutability types follow the same logic as `UnsafeCell<T>`
* `UnsafeCell<T>` having interior mutability gives it the same variance properties as `&mut T`
* `*const T` follows the logic of `&T` * `*const T` follows the logic of `&T`
* `*mut T` follows the logic of `&mut T` (or `UnsafeCell<T>`) * `*mut T` follows the logic of `&mut T` (or `UnsafeCell<T>`)
@ -197,116 +174,45 @@ For more types, see the ["Variance" section][variance-table] on the reference.
> take references with specific lifetimes (as opposed to the usual "any lifetime", > take references with specific lifetimes (as opposed to the usual "any lifetime",
> which gets into higher rank lifetimes, which work independently of subtyping). > which gets into higher rank lifetimes, which work independently of subtyping).
Ok, that's enough type theory! Let's try to apply the concept of variance to Rust Now that we have some more formal understanding of variance,
and look at some examples. let's go through some more examples in more detail.
First off, let's revisit the meowing dog example:
<!-- ignore: simplified code -->
```rust,ignore
fn evil_feeder(pet: &mut Animal) {
let spike: Dog = ...;
// `pet` is an Animal, and Dog is a subtype of Animal,
// so this should be fine, right..?
*pet = spike;
}
fn main() { ```rust,compile_fail,E0597
let mut mr_snuggles: Cat = ...; fn assign<T>(input: &mut T, val: T) {
evil_feeder(&mut mr_snuggles); // Replaces mr_snuggles with a Dog
mr_snuggles.meow(); // OH NO, MEOWING DOG!
}
```
If we look at our table of variances, we see that `&mut T` is *invariant* over `T`.
As it turns out, this completely fixes the issue! With invariance, the fact that
Cat is a subtype of Animal doesn't matter; `&mut Cat` still won't be a subtype of
`&mut Animal`. The static type checker will then correctly stop us from passing
a Cat into `evil_feeder`.
The soundness of subtyping is based on the idea that it's ok to forget unnecessary
details. But with references, there's always someone that remembers those details:
the value being referenced. That value expects those details to keep being true,
and may behave incorrectly if its expectations are violated.
The problem with making `&mut T` covariant over `T` is that it gives us the power
to modify the original value *when we don't remember all of its constraints*.
And so, we can make someone have a Dog when they're certain they still have a Cat.
With that established, we can easily see why `&T` being covariant over `T` *is*
sound: it doesn't let you modify the value, only look at it. Without any way to
mutate, there's no way for us to mess with any details. We can also see why
`UnsafeCell` and all the other interior mutability types must be invariant: they
make `&T` work like `&mut T`!
Now what about the lifetime on references? Why is it ok for both kinds of references
to be covariant over their lifetimes? Well, here's a two-pronged argument:
First and foremost, subtyping references based on their lifetimes is *the entire point
of subtyping in Rust*. The only reason we have subtyping is so we can pass
long-lived things where short-lived things are expected. So it better work!
Second, and more seriously, lifetimes are only a part of the reference itself. The
type of the referent is shared knowledge, which is why adjusting that type in only
one place (the reference) can lead to issues. But if you shrink down a reference's
lifetime when you hand it to someone, that lifetime information isn't shared in
any way. There are now two independent references with independent lifetimes.
There's no way to mess with original reference's lifetime using the other one.
Or rather, the only way to mess with someone's lifetime is to build a meowing dog.
But as soon as you try to build a meowing dog, the lifetime should be wrapped up
in an invariant type, preventing the lifetime from being shrunk. To understand this
better, let's port the meowing dog problem over to real Rust.
In the meowing dog problem we take a subtype (Cat), convert it into a supertype
(Animal), and then use that fact to overwrite the subtype with a value that satisfies
the constraints of the supertype but not the subtype (Dog).
So with lifetimes, we want to take a long-lived thing, convert it into a
short-lived thing, and then use that to write something that doesn't live long
enough into the place expecting something long-lived.
Here it is:
```rust,compile_fail
fn evil_feeder<T>(input: &mut T, val: T) {
*input = val; *input = val;
} }
fn main() { fn main() {
let mut mr_snuggles: &'static str = "meow! :3"; // mr. snuggles forever!! let mut hello: &'static str = "hello";
{ {
let spike = String::from("bark! >:V"); let world = String::from("world");
let spike_str: &str = &spike; // Only lives for the block assign(&mut hello, &world);
evil_feeder(&mut mr_snuggles, spike_str); // EVIL!
} }
println!("{}", mr_snuggles); // Use after free? println!("{hello}");
} }
``` ```
And what do we get when we run this? And what do we get when we run this?
```text ```text
error[E0597]: `spike` does not live long enough error[E0597]: `world` does not live long enough
--> src/main.rs:9:31 --> src/main.rs:9:28
| |
6 | let mut mr_snuggles: &'static str = "meow! :3"; // mr. snuggles forever!! 6 | let mut hello: &'static str = "hello";
| ------------ type annotation requires that `spike` is borrowed for `'static` | ------------ type annotation requires that `world` is borrowed for `'static`
... ...
9 | let spike_str: &str = &spike; // Only lives for the block 9 | assign(&mut hello, &world);
| ^^^^^^ borrowed value does not live long enough | ^^^^^^ borrowed value does not live long enough
10 | evil_feeder(&mut mr_snuggles, spike_str); // EVIL! 10 | }
11 | } | - `world` dropped here while still borrowed
| - `spike` dropped here while still borrowed
``` ```
Good, it doesn't compile! Let's break down what's happening here in detail. Good, it doesn't compile! Let's break down what's happening here in detail.
First let's look at the new `evil_feeder` function: First let's look at the `assign` function:
```rust ```rust
fn evil_feeder<T>(input: &mut T, val: T) { fn assign<T>(input: &mut T, val: T) {
*input = val; *input = val;
} }
``` ```
@ -315,60 +221,43 @@ All it does is take a mutable reference and a value and overwrite the referent w
What's important about this function is that it creates a type equality constraint. It What's important about this function is that it creates a type equality constraint. It
clearly says in its signature the referent and the value must be the *exact same* type. clearly says in its signature the referent and the value must be the *exact same* type.
Meanwhile, in the caller we pass in `&mut &'static str` and `&'spike_str str`. Meanwhile, in the caller we pass in `&mut &'static str` and `&'world str`.
Because `&mut T` is invariant over `T`, the compiler concludes it can't apply any subtyping Because `&mut T` is invariant over `T`, the compiler concludes it can't apply any subtyping
to the first argument, and so `T` must be exactly `&'static str`. to the first argument, and so `T` must be exactly `&'static str`.
The other argument is only an `&'a str`, which *is* covariant over `'a`. So the compiler This is counter to the `&T` case:
adopts a constraint: `&'spike_str str` must be a subtype of `&'static str` (inclusive),
which in turn implies `'spike_str` must be a subtype of `'static` (inclusive). Which is to say,
`'spike_str` must contain `'static`. But only one thing contains `'static` -- `'static` itself!
This is why we get an error when we try to assign `&spike` to `spike_str`. The ```rust
compiler has worked backwards to conclude `spike_str` must live forever, and `&spike` fn debug<T: std::fmt::Debug>(a: T, b: T) {
simply can't live that long. println!("a = {a:?} b = {b:?}");
}
```
So even though references are covariant over their lifetimes, they "inherit" invariance where similarly `a` and `b` must have the same type `T`.
whenever they're put into a context that could do something bad with that. In this case, But since `&'a T` *is* covariant over `'a`, we are allowed to perform subtyping.
we inherited invariance as soon as we put our reference inside an `&mut T`. So the compiler decides that `&'static str` can become `&'b str` if and only if
`&'static str` is a subtype of `&'b str`, which will hold if `'static <: 'b`.
This is true, so the compiler is happy to continue compiling this code.
As it turns out, the argument for why it's ok for Box (and Vec, Hashmap, etc.) to As it turns out, the argument for why it's ok for Box (and Vec, HashMap, etc.) to be covariant is pretty similar to the argument for why it's ok for lifetimes to be covariant: as soon as you try to stuff them in something like a mutable reference, they inherit invariance and you're prevented from doing anything bad.
be covariant is pretty similar to the argument for why it's ok for
lifetimes to be covariant: as soon as you try to stuff them in something like a
mutable reference, they inherit invariance and you're prevented from doing anything
bad.
However Box makes it easier to focus on by-value aspect of references that we However Box makes it easier to focus on the by-value aspect of references that we partially glossed over.
partially glossed over.
Unlike a lot of languages which allow values to be freely aliased at all times, Unlike a lot of languages which allow values to be freely aliased at all times, Rust has a very strict rule: if you're allowed to mutate or move a value, you are guaranteed to be the only one with access to it.
Rust has a very strict rule: if you're allowed to mutate or move a value, you
are guaranteed to be the only one with access to it.
Consider the following code: Consider the following code:
<!-- ignore: simplified code -->
```rust,ignore ```rust,ignore
let mr_snuggles: Box<Cat> = ..; let hello: Box<&'static str> = Box::new("hello");
let spike: Box<Dog> = ..;
let mut pet: Box<Animal>; let mut world: Box<&'b str>;
pet = mr_snuggles; world = hello;
pet = spike;
``` ```
There is no problem at all with the fact that we have forgotten that `mr_snuggles` was a Cat, There is no problem at all with the fact that we have forgotten that `hello` was alive for `'static`,
or that we overwrote him with a Dog, because as soon as we moved mr_snuggles to a variable because as soon as we moved `hello` to a variable that only knew it was alive for `'b`,
that only knew he was an Animal, **we destroyed the only thing in the universe that **we destroyed the only thing in the universe that remembered it lived for longer**!
remembered he was a Cat**!
In contrast to the argument about immutable references being soundly covariant because they
don't let you change anything, owned values can be covariant because they make you
change *everything*. There is no connection between old locations and new locations.
Applying by-value subtyping is an irreversible act of knowledge destruction, and
without any memory of how things used to be, no one can be tricked into acting on
that old information!
Only one thing left to explain: function pointers. Only one thing left to explain: function pointers.
@ -376,43 +265,75 @@ To see why `fn(T) -> U` should be covariant over `U`, consider the following sig
<!-- ignore: simplified code --> <!-- ignore: simplified code -->
```rust,ignore ```rust,ignore
fn get_animal() -> Animal; fn get_str() -> &'a str;
``` ```
This function claims to produce an Animal. As such, it is perfectly valid to This function claims to produce a `str` bound by some liftime `'a`. As such, it is perfectly valid to
provide a function with the following signature instead: provide a function with the following signature instead:
<!-- ignore: simplified code --> <!-- ignore: simplified code -->
```rust,ignore ```rust,ignore
fn get_animal() -> Cat; fn get_static() -> &'static str;
``` ```
After all, Cats are Animals, so always producing a Cat is a perfectly valid way So when the function is called, all it's expecting is a `&str` which lives at least the lifetime of `'a`,
to produce Animals. Or to relate it back to real Rust: if we need a function it doesn't matter if the value actually lives longer.
that is supposed to produce something that lives for `'short`, it's perfectly
fine for it to produce something that lives for `'long`. We don't care, we can
just forget that fact.
However, the same logic does not apply to *arguments*. Consider trying to satisfy: However, the same logic does not apply to *arguments*. Consider trying to satisfy:
<!-- ignore: simplified code --> <!-- ignore: simplified code -->
```rust,ignore ```rust,ignore
fn handle_animal(Animal); fn store_ref(&'a str);
``` ```
with: with:
<!-- ignore: simplified code --> <!-- ignore: simplified code -->
```rust,ignore ```rust,ignore
fn handle_animal(Cat); fn store_static(&'static str);
``` ```
The first function can accept Dogs, but the second function absolutely can't. The first function can accept any string reference as long as it lives at least for `'a`,
but the second cannot accept a string reference that lives for any duration less than `'static`,
which would cause a conflict.
Covariance doesn't work here. But if we flip it around, it actually *does* Covariance doesn't work here. But if we flip it around, it actually *does*
work! If we need a function that can handle Cats, a function that can handle *any* work! If we need a function that can handle `&'static str`, a function that can handle *any* reference lifetime
Animal will surely work fine. Or to relate it back to real Rust: if we need a will surely work fine.
function that can handle anything that lives for at least `'long`, it's perfectly
fine for it to be able to handle anything that lives for at least `'short`. Let's see this in practice
```rust,compile_fail
# use std::cell::RefCell;
thread_local! {
pub static StaticVecs: RefCell<Vec<&'static str>> = RefCell::new(Vec::new());
}
/// saves the input given into a thread local `Vec<&'static str>`
fn store(input: &'static str) {
StaticVecs.with_borrow_mut(|v| v.push(input));
}
/// Calls the function with it's input (must have the same lifetime!)
fn demo<'a>(input: &'a str, f: fn(&'a str)) {
f(input);
}
fn main() {
demo("hello", store); // "hello" is 'static. Can call `store` fine
{
let smuggle = String::from("smuggle");
// `&smuggle` is not static. If we were to call `store` with `&smuggle`,
// we would have pushed an invalid lifetime into the `StaticVecs`.
// Therefore, `fn(&'static str)` cannot be a subtype of `fn(&'a str)`
demo(&smuggle, store);
}
// use after free 😿
StaticVecs.with_borrow(|v| println!("{v:?}"));
}
```
And that's why function types, unlike anything else in the language, are And that's why function types, unlike anything else in the language, are
**contra**variant over their arguments. **contra**variant over their arguments.

@ -2,7 +2,7 @@
Get out of our way type system! We're going to reinterpret these bits or die Get out of our way type system! We're going to reinterpret these bits or die
trying! Even though this book is all about doing things that are unsafe, I trying! Even though this book is all about doing things that are unsafe, I
really can't emphasize that you should deeply think about finding Another Way really can't emphasize enough that you should deeply think about finding Another Way
than the operations covered in this section. This is really, truly, the most than the operations covered in this section. This is really, truly, the most
horribly unsafe thing you can do in Rust. The guardrails here are dental floss. horribly unsafe thing you can do in Rust. The guardrails here are dental floss.

@ -1,13 +1,13 @@
# Unbounded Lifetimes # Unbounded Lifetimes
Unsafe code can often end up producing references or lifetimes out of thin air. Unsafe code can often end up producing references or lifetimes out of thin air.
Such lifetimes come into the world as *unbounded*. The most common source of this Such lifetimes come into the world as *unbounded*. The most common source of
is dereferencing a raw pointer, which produces a reference with an unbounded lifetime. this is taking a reference to a dereferenced raw pointer, which produces a
Such a lifetime becomes as big as context demands. This is in fact more powerful reference with an unbounded lifetime. Such a lifetime becomes as big as context
than simply becoming `'static`, because for instance `&'static &'a T` demands. This is in fact more powerful than simply becoming `'static`, because
will fail to typecheck, but the unbound lifetime will perfectly mold into for instance `&'static &'a T` will fail to typecheck, but the unbound lifetime
`&'a &'a T` as needed. However for most intents and purposes, such an unbounded will perfectly mold into `&'a &'a T` as needed. However for most intents and
lifetime can be regarded as `'static`. purposes, such an unbounded lifetime can be regarded as `'static`.
Almost no reference is `'static`, so this is probably wrong. `transmute` and Almost no reference is `'static`, so this is probably wrong. `transmute` and
`transmute_copy` are the two other primary offenders. One should endeavor to `transmute_copy` are the two other primary offenders. One should endeavor to
@ -17,17 +17,25 @@ boundaries.
Given a function, any output lifetimes that don't derive from inputs are Given a function, any output lifetimes that don't derive from inputs are
unbounded. For instance: unbounded. For instance:
<!-- ignore: simplified code --> <!-- no_run: This example exhibits undefined behavior. -->
```rust,ignore ```rust,no_run
fn get_str<'a>() -> &'a str; fn get_str<'a>(s: *const String) -> &'a str {
unsafe { &*s }
}
fn main() {
let soon_dropped = String::from("hello");
let dangling = get_str(&soon_dropped);
drop(soon_dropped);
println!("Invalid str: {}", dangling); // Invalid str: gӚ_`
}
``` ```
will produce an `&str` with an unbounded lifetime. The easiest way to avoid The easiest way to avoid unbounded lifetimes is to use lifetime elision at the
unbounded lifetimes is to use lifetime elision at the function boundary. function boundary. If an output lifetime is elided, then it *must* be bounded by
If an output lifetime is elided, then it *must* be bounded by an input lifetime. an input lifetime. Of course it might be bounded by the *wrong* lifetime, but
Of course it might be bounded by the *wrong* lifetime, but this will usually this will usually just cause a compiler error, rather than allow memory safety
just cause a compiler error, rather than allow memory safety to be trivially to be trivially violated.
violated.
Within a function, bounding lifetimes is more error-prone. The safest and easiest Within a function, bounding lifetimes is more error-prone. The safest and easiest
way to bound a lifetime is to return it from a function with a bound lifetime. way to bound a lifetime is to return it from a function with a bound lifetime.

@ -11,7 +11,7 @@ Unsafe Rust gives us a powerful tool to handle this problem:
[`MaybeUninit`]. This type can be used to handle memory that has not been fully [`MaybeUninit`]. This type can be used to handle memory that has not been fully
initialized yet. initialized yet.
With `MaybeUninit`, we can initialize an array element-for-element as follows: With `MaybeUninit`, we can initialize an array element by element as follows:
```rust ```rust
use std::mem::{self, MaybeUninit}; use std::mem::{self, MaybeUninit};
@ -79,8 +79,7 @@ This code proceeds in three steps:
acknowledge that by providing appropriate methods). acknowledge that by providing appropriate methods).
It's worth spending a bit more time on the loop in the middle, and in particular It's worth spending a bit more time on the loop in the middle, and in particular
the assignment operator and its interaction with `drop`. If we would have the assignment operator and its interaction with `drop`. If we wrote something like:
written something like:
<!-- ignore: simplified code --> <!-- ignore: simplified code -->
```rust,ignore ```rust,ignore
@ -88,7 +87,7 @@ written something like:
``` ```
we would actually overwrite a `Box<u32>`, leading to `drop` of uninitialized we would actually overwrite a `Box<u32>`, leading to `drop` of uninitialized
data, which will cause much sadness and pain. data, which would cause much sadness and pain.
The correct alternative, if for some reason we cannot use `MaybeUninit::new`, is The correct alternative, if for some reason we cannot use `MaybeUninit::new`, is
to use the [`ptr`] module. In particular, it provides three functions that allow to use the [`ptr`] module. In particular, it provides three functions that allow
@ -97,7 +96,7 @@ us to assign bytes to a location in memory without dropping the old value:
* `ptr::write(ptr, val)` takes a `val` and moves it into the address pointed * `ptr::write(ptr, val)` takes a `val` and moves it into the address pointed
to by `ptr`. to by `ptr`.
* `ptr::copy(src, dest, count)` copies the bits that `count` T's would occupy * `ptr::copy(src, dest, count)` copies the bits that `count` T items would occupy
from src to dest. (this is equivalent to C's memmove -- note that the argument from src to dest. (this is equivalent to C's memmove -- note that the argument
order is reversed!) order is reversed!)
* `ptr::copy_nonoverlapping(src, dest, count)` does what `copy` does, but a * `ptr::copy_nonoverlapping(src, dest, count)` does what `copy` does, but a
@ -105,8 +104,8 @@ us to assign bytes to a location in memory without dropping the old value:
(this is equivalent to C's memcpy -- note that the argument order is reversed!) (this is equivalent to C's memcpy -- note that the argument order is reversed!)
It should go without saying that these functions, if misused, will cause serious It should go without saying that these functions, if misused, will cause serious
havoc or just straight up Undefined Behavior. The only things that these havoc or just straight up Undefined Behavior. The only requirement of these
functions *themselves* require is that the locations you want to read and write functions *themselves* is that the locations you want to read and write
are allocated and properly aligned. However, the ways writing arbitrary bits to are allocated and properly aligned. However, the ways writing arbitrary bits to
arbitrary locations of memory can break things are basically uncountable! arbitrary locations of memory can break things are basically uncountable!

@ -28,7 +28,6 @@ impl<T> Vec<T> {
ptr: NonNull::dangling(), ptr: NonNull::dangling(),
len: 0, len: 0,
cap: 0, cap: 0,
_marker: PhantomData,
} }
} }
} }

@ -93,7 +93,7 @@ impl<T> IntoIterator for Vec<T> {
mem::forget(self); mem::forget(self);
IntoIter { IntoIter {
iter: iter, iter,
_buf: buf, _buf: buf,
} }
} }
@ -135,18 +135,16 @@ impl<'a, T> Drop for Drain<'a, T> {
impl<T> Vec<T> { impl<T> Vec<T> {
pub fn drain(&mut self) -> Drain<T> { pub fn drain(&mut self) -> Drain<T> {
unsafe { let iter = unsafe { RawValIter::new(&self) };
let iter = RawValIter::new(&self);
// this is a mem::forget safety thing. If Drain is forgotten, we just // this is a mem::forget safety thing. If Drain is forgotten, we just
// leak the whole Vec's contents. Also we need to do this *eventually* // leak the whole Vec's contents. Also we need to do this *eventually*
// anyway, so why not do it now? // anyway, so why not do it now?
self.len = 0; self.len = 0;
Drain { Drain {
iter: iter, iter,
vec: PhantomData, vec: PhantomData,
}
} }
} }
} }

@ -10,7 +10,6 @@ use std::ptr::{self, NonNull};
struct RawVec<T> { struct RawVec<T> {
ptr: NonNull<T>, ptr: NonNull<T>,
cap: usize, cap: usize,
_marker: PhantomData<T>,
} }
unsafe impl<T: Send> Send for RawVec<T> {} unsafe impl<T: Send> Send for RawVec<T> {}
@ -24,8 +23,7 @@ impl<T> RawVec<T> {
// `NonNull::dangling()` doubles as "unallocated" and "zero-sized allocation" // `NonNull::dangling()` doubles as "unallocated" and "zero-sized allocation"
RawVec { RawVec {
ptr: NonNull::dangling(), ptr: NonNull::dangling(),
cap: cap, cap,
_marker: PhantomData,
} }
} }
@ -129,7 +127,7 @@ impl<T> Vec<T> {
pub fn insert(&mut self, index: usize, elem: T) { pub fn insert(&mut self, index: usize, elem: T) {
assert!(index <= self.len, "index out of bounds"); assert!(index <= self.len, "index out of bounds");
if self.cap() == self.len { if self.len == self.cap() {
self.buf.grow(); self.buf.grow();
} }
@ -140,14 +138,17 @@ impl<T> Vec<T> {
self.len - index, self.len - index,
); );
ptr::write(self.ptr().add(index), elem); ptr::write(self.ptr().add(index), elem);
self.len += 1;
} }
self.len += 1;
} }
pub fn remove(&mut self, index: usize) -> T { pub fn remove(&mut self, index: usize) -> T {
assert!(index < self.len, "index out of bounds"); assert!(index < self.len, "index out of bounds");
self.len -= 1;
unsafe { unsafe {
self.len -= 1;
let result = ptr::read(self.ptr().add(index)); let result = ptr::read(self.ptr().add(index));
ptr::copy( ptr::copy(
self.ptr().add(index + 1), self.ptr().add(index + 1),
@ -159,18 +160,16 @@ impl<T> Vec<T> {
} }
pub fn drain(&mut self) -> Drain<T> { pub fn drain(&mut self) -> Drain<T> {
unsafe { let iter = unsafe { RawValIter::new(&self) };
let iter = RawValIter::new(&self);
// this is a mem::forget safety thing. If Drain is forgotten, we just // this is a mem::forget safety thing. If Drain is forgotten, we just
// leak the whole Vec's contents. Also we need to do this *eventually* // leak the whole Vec's contents. Also we need to do this *eventually*
// anyway, so why not do it now? // anyway, so why not do it now?
self.len = 0; self.len = 0;
Drain { Drain {
iter: iter, iter,
vec: PhantomData, vec: PhantomData,
}
} }
} }
} }
@ -199,15 +198,15 @@ impl<T> IntoIterator for Vec<T> {
type Item = T; type Item = T;
type IntoIter = IntoIter<T>; type IntoIter = IntoIter<T>;
fn into_iter(self) -> IntoIter<T> { fn into_iter(self) -> IntoIter<T> {
unsafe { let (iter, buf) = unsafe {
let iter = RawValIter::new(&self); (RawValIter::new(&self), ptr::read(&self.buf))
let buf = ptr::read(&self.buf); };
mem::forget(self);
IntoIter { mem::forget(self);
iter: iter,
_buf: buf, IntoIter {
} iter,
_buf: buf,
} }
} }
} }

@ -18,16 +18,19 @@ pub fn insert(&mut self, index: usize, elem: T) {
// Note: `<=` because it's valid to insert after everything // Note: `<=` because it's valid to insert after everything
// which would be equivalent to push. // which would be equivalent to push.
assert!(index <= self.len, "index out of bounds"); assert!(index <= self.len, "index out of bounds");
if self.cap == self.len { self.grow(); } if self.len == self.cap { self.grow(); }
unsafe { unsafe {
// ptr::copy(src, dest, len): "copy from src to dest len elems" // ptr::copy(src, dest, len): "copy from src to dest len elems"
ptr::copy(self.ptr.as_ptr().add(index), ptr::copy(
self.ptr.as_ptr().add(index + 1), self.ptr.as_ptr().add(index),
self.len - index); self.ptr.as_ptr().add(index + 1),
self.len - index,
);
ptr::write(self.ptr.as_ptr().add(index), elem); ptr::write(self.ptr.as_ptr().add(index), elem);
self.len += 1;
} }
self.len += 1;
} }
``` ```
@ -42,9 +45,11 @@ pub fn remove(&mut self, index: usize) -> T {
unsafe { unsafe {
self.len -= 1; self.len -= 1;
let result = ptr::read(self.ptr.as_ptr().add(index)); let result = ptr::read(self.ptr.as_ptr().add(index));
ptr::copy(self.ptr.as_ptr().add(index + 1), ptr::copy(
self.ptr.as_ptr().add(index), self.ptr.as_ptr().add(index + 1),
self.len - index); self.ptr.as_ptr().add(index),
self.len - index,
);
result result
} }
} }

@ -49,7 +49,6 @@ pub struct IntoIter<T> {
cap: usize, cap: usize,
start: *const T, start: *const T,
end: *const T, end: *const T,
_marker: PhantomData<T>,
} }
``` ```
@ -61,27 +60,24 @@ impl<T> IntoIterator for Vec<T> {
type Item = T; type Item = T;
type IntoIter = IntoIter<T>; type IntoIter = IntoIter<T>;
fn into_iter(self) -> IntoIter<T> { fn into_iter(self) -> IntoIter<T> {
// Can't destructure Vec since it's Drop
let ptr = self.ptr;
let cap = self.cap;
let len = self.len;
// Make sure not to drop Vec since that would free the buffer // Make sure not to drop Vec since that would free the buffer
mem::forget(self); let vec = ManuallyDrop::new(self);
unsafe { // Can't destructure Vec since it's Drop
IntoIter { let ptr = vec.ptr;
buf: ptr, let cap = vec.cap;
cap: cap, let len = vec.len;
start: ptr.as_ptr(),
end: if cap == 0 { IntoIter {
// can't offset off this pointer, it's not allocated! buf: ptr,
ptr.as_ptr() cap,
} else { start: ptr.as_ptr(),
ptr.as_ptr().add(len) end: if cap == 0 {
}, // can't offset off this pointer, it's not allocated!
_marker: PhantomData, ptr.as_ptr()
} } else {
unsafe { ptr.as_ptr().add(len) }
},
} }
} }
} }

@ -15,13 +15,10 @@ pub struct Vec<T> {
} }
``` ```
And indeed this would compile. Unfortunately, it would be incorrect. First, the And indeed this would compile. Unfortunately, it would be too strict. The
compiler will give us too strict variance. So a `&Vec<&'static str>` compiler will give us too strict variance. So a `&Vec<&'static str>`
couldn't be used where an `&Vec<&'a str>` was expected. More importantly, it couldn't be used where a `&Vec<&'a str>` was expected. See [the chapter
will give incorrect ownership information to the drop checker, as it will on ownership and lifetimes][ownership] for all the details on variance.
conservatively assume we don't own any values of type `T`. See [the chapter
on ownership and lifetimes][ownership] for all the details on variance and
drop check.
As we saw in the ownership chapter, the standard library uses `Unique<T>` in place of As we saw in the ownership chapter, the standard library uses `Unique<T>` in place of
`*mut T` when it has a raw pointer to an allocation that it owns. Unique is unstable, `*mut T` when it has a raw pointer to an allocation that it owns. Unique is unstable,
@ -30,26 +27,24 @@ so we'd like to not use it if possible, though.
As a recap, Unique is a wrapper around a raw pointer that declares that: As a recap, Unique is a wrapper around a raw pointer that declares that:
* We are covariant over `T` * We are covariant over `T`
* We may own a value of type `T` (for drop check) * We may own a value of type `T` (this is not relevant for our example here, but see
[the chapter on PhantomData][phantom-data] on why the real `std::vec::Vec<T>` needs this)
* We are Send/Sync if `T` is Send/Sync * We are Send/Sync if `T` is Send/Sync
* Our pointer is never null (so `Option<Vec<T>>` is null-pointer-optimized) * Our pointer is never null (so `Option<Vec<T>>` is null-pointer-optimized)
We can implement all of the above requirements in stable Rust. To do this, instead We can implement all of the above requirements in stable Rust. To do this, instead
of using `Unique<T>` we will use [`NonNull<T>`][NonNull], another wrapper around a of using `Unique<T>` we will use [`NonNull<T>`][NonNull], another wrapper around a
raw pointer, which gives us two of the above properties, namely it is covariant raw pointer, which gives us two of the above properties, namely it is covariant
over `T` and is declared to never be null. By adding a `PhantomData<T>` (for drop over `T` and is declared to never be null. By implementing Send/Sync if `T` is,
check) and implementing Send/Sync if `T` is, we get the same results as using we get the same results as using `Unique<T>`:
`Unique<T>`:
```rust ```rust
use std::ptr::NonNull; use std::ptr::NonNull;
use std::marker::PhantomData;
pub struct Vec<T> { pub struct Vec<T> {
ptr: NonNull<T>, ptr: NonNull<T>,
cap: usize, cap: usize,
len: usize, len: usize,
_marker: PhantomData<T>,
} }
unsafe impl<T: Send> Send for Vec<T> {} unsafe impl<T: Send> Send for Vec<T> {}
@ -58,4 +53,5 @@ unsafe impl<T: Sync> Sync for Vec<T> {}
``` ```
[ownership]: ../ownership.html [ownership]: ../ownership.html
[phantom-data]: ../phantom-data.md
[NonNull]: ../../std/ptr/struct.NonNull.html [NonNull]: ../../std/ptr/struct.NonNull.html

@ -13,7 +13,6 @@ allocating, growing, and freeing:
struct RawVec<T> { struct RawVec<T> {
ptr: NonNull<T>, ptr: NonNull<T>,
cap: usize, cap: usize,
_marker: PhantomData<T>,
} }
unsafe impl<T: Send> Send for RawVec<T> {} unsafe impl<T: Send> Send for RawVec<T> {}
@ -25,23 +24,17 @@ impl<T> RawVec<T> {
RawVec { RawVec {
ptr: NonNull::dangling(), ptr: NonNull::dangling(),
cap: 0, cap: 0,
_marker: PhantomData,
} }
} }
fn grow(&mut self) { fn grow(&mut self) {
let (new_cap, new_layout) = if self.cap == 0 { // This can't overflow because we ensure self.cap <= isize::MAX.
(1, Layout::array::<T>(1).unwrap()) let new_cap = if self.cap == 0 { 1 } else { 2 * self.cap };
} else {
// This can't overflow because we ensure self.cap <= isize::MAX. // Layout::array checks that the number of bytes is <= usize::MAX,
let new_cap = 2 * self.cap; // but this is redundant since old_layout.size() <= isize::MAX,
// so the `unwrap` should never fail.
// Layout::array checks that the number of bytes is <= usize::MAX, let new_layout = Layout::array::<T>(new_cap).unwrap();
// but this is redundant since old_layout.size() <= isize::MAX,
// so the `unwrap` should never fail.
let new_layout = Layout::array::<T>(new_cap).unwrap();
(new_cap, new_layout)
};
// Ensure that the new allocation doesn't exceed `isize::MAX` bytes. // Ensure that the new allocation doesn't exceed `isize::MAX` bytes.
assert!(new_layout.size() <= isize::MAX as usize, "Allocation too large"); assert!(new_layout.size() <= isize::MAX as usize, "Allocation too large");
@ -138,23 +131,21 @@ impl<T> IntoIterator for Vec<T> {
type Item = T; type Item = T;
type IntoIter = IntoIter<T>; type IntoIter = IntoIter<T>;
fn into_iter(self) -> IntoIter<T> { fn into_iter(self) -> IntoIter<T> {
unsafe { // need to use ptr::read to unsafely move the buf out since it's
// need to use ptr::read to unsafely move the buf out since it's // not Copy, and Vec implements Drop (so we can't destructure it).
// not Copy, and Vec implements Drop (so we can't destructure it). let buf = unsafe { ptr::read(&self.buf) };
let buf = ptr::read(&self.buf); let len = self.len;
let len = self.len; mem::forget(self);
mem::forget(self);
IntoIter {
IntoIter { start: buf.ptr.as_ptr(),
start: buf.ptr.as_ptr(), end: if buf.cap == 0 {
end: if buf.cap == 0 { // can't offset off of a pointer unless it's part of an allocation
// can't offset off of a pointer unless it's part of an allocation buf.ptr.as_ptr()
buf.ptr.as_ptr() } else {
} else { unsafe { buf.ptr.as_ptr().add(len) }
buf.ptr.as_ptr().add(len) },
}, _buf: buf,
_buf: buf,
}
} }
} }
} }

@ -33,14 +33,13 @@ method of `RawVec`.
```rust,ignore ```rust,ignore
impl<T> RawVec<T> { impl<T> RawVec<T> {
fn new() -> Self { fn new() -> Self {
// !0 is usize::MAX. This branch should be stripped at compile time. // This branch should be stripped at compile time.
let cap = if mem::size_of::<T>() == 0 { !0 } else { 0 }; let cap = if mem::size_of::<T>() == 0 { usize::MAX } else { 0 };
// `NonNull::dangling()` doubles as "unallocated" and "zero-sized allocation" // `NonNull::dangling()` doubles as "unallocated" and "zero-sized allocation"
RawVec { RawVec {
ptr: NonNull::dangling(), ptr: NonNull::dangling(),
cap: cap, cap,
_marker: PhantomData,
} }
} }

@ -71,8 +71,7 @@ Rust considers it "safe" to:
* Deadlock * Deadlock
* Have a [race condition][race] * Have a [race condition][race]
* Leak memory * Leak memory
* Fail to call destructors * Overflow integers (with the built-in operators such as `+` etc.)
* Overflow integers
* Abort the program * Abort the program
* Delete the production database * Delete the production database

Loading…
Cancel
Save