diff --git a/src/SUMMARY.md b/src/SUMMARY.md index db72286..3a86a45 100644 --- a/src/SUMMARY.md +++ b/src/SUMMARY.md @@ -19,6 +19,7 @@ * [Higher-Rank Trait Bounds](hrtb.md) * [Subtyping and Variance](subtyping.md) * [Drop Check](dropck.md) + * [Drop Check Escape Patch](dropck-eyepatch.md) * [PhantomData](phantom-data.md) * [Splitting Borrows](borrow-splitting.md) * [Type Conversions](conversions.md) diff --git a/src/destructors.md b/src/destructors.md index 5b3f546..eb4eb57 100644 --- a/src/destructors.md +++ b/src/destructors.md @@ -17,36 +17,36 @@ boilerplate" to drop children. If a struct has no special logic for being dropped other than dropping its children, then it means `Drop` doesn't need to be implemented at all! -**There is no stable way to prevent this behavior in Rust 1.0.** +If this behaviour is unacceptable, it can be supressed by placing each field +you don't want to drop in a `union`. The standard library provides the +[`mem::ManuallyDrop`][ManuallyDrop] wrapper type as a convience for doing this. -Note that taking `&mut self` means that even if you could suppress recursive -Drop, Rust will prevent you from e.g. moving fields out of self. For most types, -this is totally fine. -For instance, a custom implementation of `Box` might write `Drop` like this: + +Consider a custom implementation of `Box`, which might write `Drop` like this: ```rust #![feature(unique, allocator_api)] use std::heap::{Heap, Alloc, Layout}; use std::mem; -use std::ptr::{drop_in_place, Unique}; +use std::ptr::drop_in_place; -struct Box{ ptr: Unique } +struct Box{ ptr: *mut T } impl Drop for Box { fn drop(&mut self) { unsafe { - drop_in_place(self.ptr.as_ptr()); - Heap.dealloc(self.ptr.as_ptr() as *mut u8, Layout::new::()) + drop_in_place(self.ptr); + Heap.dealloc(self.ptr as *mut u8, Layout::new::()) } } } # fn main() {} ``` -and this works fine because when Rust goes to drop the `ptr` field it just sees -a [Unique] that has no actual `Drop` implementation. Similarly nothing can +This works fine because when Rust goes to drop the `ptr` field it just sees +a `*mut T` that has no actual `Drop` implementation. Similarly nothing can use-after-free the `ptr` because when drop exits, it becomes inaccessible. However this wouldn't work: @@ -55,16 +55,16 @@ However this wouldn't work: #![feature(allocator_api, unique)] use std::heap::{Heap, Alloc, Layout}; -use std::ptr::{drop_in_place, Unique}; +use std::ptr::drop_in_place; use std::mem; -struct Box{ ptr: Unique } +struct Box{ ptr: *mut T } impl Drop for Box { fn drop(&mut self) { unsafe { - drop_in_place(self.ptr.as_ptr()); - Heap.dealloc(self.ptr.as_ptr() as *mut u8, Layout::new::()); + drop_in_place(self.ptr); + Heap.dealloc(self.ptr as *mut u8, Layout::new::()); } } } @@ -74,17 +74,17 @@ struct SuperBox { my_box: Box } impl Drop for SuperBox { fn drop(&mut self) { unsafe { - // Hyper-optimized: deallocate the box's contents for it + // """Hyper-optimized""": deallocate the box's contents for it // without `drop`ing the contents - Heap.dealloc(self.my_box.ptr.as_ptr() as *mut u8, Layout::new::()); + Heap.dealloc(self.my_box.ptr as *mut u8, Layout::new::()); } } } # fn main() {} ``` -After we deallocate the `box`'s ptr in SuperBox's destructor, Rust will -happily proceed to tell the box to Drop itself and everything will blow up with +After we deallocate `my_box`'s ptr in SuperBox's destructor, Rust will +happily proceed to tell `my_box` to Drop itself and everything will blow up with use-after-frees and double-frees. Note that the recursive drop behavior applies to all structs and enums @@ -98,9 +98,10 @@ struct Boxy { } ``` -will have its data1 and data2's fields destructors whenever it "would" be +will have its `data1` and `data2` fields' destructors run whenever it "would" be dropped, even though it itself doesn't implement Drop. We say that such a type -*needs Drop*, even though it is not itself Drop. +*needs Drop*, even though it is not itself Drop. This property can be checked +for with the [`mem::needs_drop()`][needs_drop] function. Similarly, @@ -115,27 +116,27 @@ will have its inner Box field dropped if and only if an instance stores the Next variant. In general this works really nicely because you don't need to worry about -adding/removing drops when you refactor your data layout. Still there's +adding/removing drops when you refactor your data layout. But there's certainly many valid usecases for needing to do trickier things with destructors. -The classic safe solution to overriding recursive drop and allowing moving out +The classic safe solution to preventing recursive drop and allowing moving out of Self during `drop` is to use an Option: ```rust #![feature(allocator_api, unique)] use std::heap::{Alloc, Heap, Layout}; -use std::ptr::{drop_in_place, Unique}; +use std::ptr::drop_in_place; use std::mem; -struct Box{ ptr: Unique } +struct Box{ ptr: *mut T } impl Drop for Box { fn drop(&mut self) { unsafe { - drop_in_place(self.ptr.as_ptr()); - Heap.dealloc(self.ptr.as_ptr() as *mut u8, Layout::new::()); + drop_in_place(self.ptr); + Heap.dealloc(self.ptr as *mut u8, Layout::new::()); } } } @@ -149,7 +150,7 @@ impl Drop for SuperBox { // without `drop`ing the contents. Need to set the `box` // field as `None` to prevent Rust from trying to Drop it. let my_box = self.my_box.take().unwrap(); - Heap.dealloc(my_box.ptr.as_ptr() as *mut u8, Layout::new::()); + Heap.dealloc(my_box.ptr as *mut u8, Layout::new::()); mem::forget(my_box); } } @@ -165,7 +166,10 @@ deinitializing the field. Not that it will prevent you from producing any other arbitrarily invalid state in there. On balance this is an ok choice. Certainly what you should reach for by default. -However, in the future we expect there to be a first-class way to announce that -a field shouldn't be automatically dropped. -[Unique]: phantom-data.html +Should using Option be unacceptable, [`ManuallyDrop`][ManuallyDrop] is always +available. + + +[ManuallyDrop]: https://doc.rust-lang.org/std/mem/union.ManuallyDrop.html +[needs_drop]: https://doc.rust-lang.org/nightly/std/mem/fn.needs_drop.html diff --git a/src/drop-flags.md b/src/drop-flags.md index e69264f..dc00f24 100644 --- a/src/drop-flags.md +++ b/src/drop-flags.md @@ -1,7 +1,6 @@ # Drop Flags -The examples in the previous section introduce an interesting problem for Rust. -We have seen that it's possible to conditionally initialize, deinitialize, and +We've seen that it's possible to conditionally initialize, deinitialize, and reinitialize locations of memory totally safely. For Copy types, this isn't particularly notable since they're just a random pile of bits. However types with destructors are a different story: Rust needs to know whether to call a @@ -79,5 +78,8 @@ if condition { } ``` -The drop flags are tracked on the stack and no longer stashed in types that -implement drop. +At Rust 1.0, these flags were stored in the actual values that needed to +be tracked. This was a big mess and you had to worry about it in unsafe code. + +As of Rust 1.13, these flags are stored seperately on the stack, so you no +longer need to worry about them. diff --git a/src/dropck-eyepatch.md b/src/dropck-eyepatch.md new file mode 100644 index 0000000..62a4539 --- /dev/null +++ b/src/dropck-eyepatch.md @@ -0,0 +1,434 @@ +# Drop Check: The Escape Patch + +In spite of everything stated in the previous section, this code compiles: + +```rust +fn main() { + let (day, inspector); + day = Box::new(0); + inspector = vec![&*day]; + println!("{:?}", inspector); +} +``` + +Here instead of storing a reference in our own custom type, we use a Vec, which +is of course generic and implements Drop. Surprisingly, the compiler thinks it's +fine that the Vec stores a reference that lives exactly as long as it. + +What happened? + +Well, there's an unsafe escape hatch, and the standard library uses it for its +collections and owning pointer types such as Box, Rc, Vec, and BTreeMap. With +this escape hatch, we can tell the compiler that we *promise* that it's safe +for a given generic argument to dangle. + +Before we proceed, I must emphasize that this is an incredibly obscure feature. +As in, many people who work on the Rust standard library and rustc itself don't +even know that this exists. So with all likelihood, no one will notice if you +don't use this feature. + +**In other words: please don't use the unstable feature we're about to describe!** + +This feature primarily exists because some tricky parts of rustc itself use it. +We document it here primarily for the purposes of maintaining the rust-lang +codebase itself. + +So let's say we want to write this modified inspector code: + +```rust,ignore +struct Inspector<'a, T: 'a> { data: &'a T } + +impl<'a, T> Drop for Inspector<'a, T> { + fn drop(&mut self) { + println!("I was only one day from retirement!"); + } +} + +fn main() { + let (data, inspector); + data = Box::new(0u8); + inspector = Inspector { data: &*data }; +} +``` + +This Inspector is perfectly safe, because it doesn't actually access its +generic data in its destructor. Sadly, the code still doesn't compile: + +```text +error: `*data` does not live long enough + --> src/main.rs:13:1 + | +12 | inspector = Inspector { data: &*data }; + | ----- borrow occurs here +13 | } + | ^ `*data` dropped here while still borrowed + | + = note: values in a scope are dropped in the opposite order they are created +``` + +This is because as far as Rust is concerned you *could have* accessed it, and +Rust refuses to inspect your drop implementation to be sure. + +With [the eyepatch RFC][eyepatch], we can *partially blind* dropck, by hiding one of our +generic parameters from it. (...by covering it with a patch. Get it? ...Eyepatch?) + +The patch we apply is as follows: + +```rust +#![feature(generic_param_attrs, dropck_eyepatch)] + +struct Inspector<'a, T: 'a> { data: &'a T } + +// Changes here: +unsafe impl<#[may_dangle] 'a, T> Drop for Inspector<'a, T> { + fn drop(&mut self) { + println!("I was only one day from retirement!"); + } +} + +fn main() { + let (data, inspector); + data = Box::new(0u8); + inspector = Inspector { data: &*data }; +} +``` + +...and it compiles and runs! + +There are two changes: + +* We add `#[may_dangle]` to one of our type parameters +* We add `unsafe` to the impl block (which is required by may_dangle to emphasize the risks) + +Note also that `#[may_dangle]` requires both the `generic_param_attrs`, and the +`dropck_eyepatch` features. + +The may_dangle attribute tells the dropck to ignore `'a` in its analysis. Since this was +the only reason the Inspector was considered unsound (T is just a u8), our code compiles. + +`#[may_dangle]` may be applied to any type parameter. For instance, if we change +`data` to just `T` (so `T = &'a u8`), then we need to blind dropck from `T`: + +```rust +#![feature(generic_param_attrs, dropck_eyepatch)] + +struct Inspector { data: T } + +// Changes here: +unsafe impl<#[may_dangle] T> Drop for Inspector { + fn drop(&mut self) { + println!("I was only one day from retirement!"); + } +} + +fn main() { + let (data, inspector); + data = Box::new(0u8); + inspector = Inspector { data: &*data }; +} +``` + + + + +# When The Eyepatch is (Un)Sound + +The general rule for when it's safe to apply the dropck eyepatch to a type parameter +`T` is that the destructor must only do things to values of type `T` that could be +done with *all* types. Basically: we can move (or copy) the values around, take +references to them, get their size/align, and drop them. Just to be clear +on why these are fine: + +* Moving and Copying is just bitwise, and it's perfectly safe to copy the bits + representing a dangling pointer. + +* Static size/align computation (as with `size_of`) doesn't involve actually + looking at instances of the type, so dangling doesn't matter. + +* Dynamic size/align computation (as with `size_of_val`) is also fine, because + it only looks at the trait object's vtable. This vtable is statically + allocated, and can be found without looking at the actual instance's data. + +* Dropping a pointer is a noop, so it doesn't matter if they're actually + dangling. + +In theory, a function that's generic over all `T` (like `mem::replace`) must also +follow these rules, but in a world with specialization that isn't necessarily true. +For instance any totally-generic function may specialize on `T: Display` to print +the values when possible (please file 100 bugs if `mem::replace` ever does this). + +Also note that the following closure isn't actually generic over all values of +type `T`; its body knows the exact type of `T` and therefore can dereference +any dangling pointers `T` might contain: + +```rust,ignore +impl where F: Fn(T) { ... } +``` + +All `Vec` and friends do in their destructors is traverse themselves using their +own structure, drop all of the `T`'s they contain, and free themselves. This is +why it's sound for them to apply the eyepatch to their parameters. + + + + + + +# When The Eyepatch Needs Help + +Applying the eyepatch correctly isn't sufficient to get a sound drop checking. +To see why, consider this example: + +```rust +#![feature(generic_param_attrs, dropck_eyepatch)] + +use std::fmt::Debug; + +struct Inspector { data: T } + +// Doesn't use eyepatch, but clearly looks at its payload. This is fine. +// Dropck will correctly require that this strictly outlives its payload. +impl Drop for Inspector { + fn drop(&mut self) { + println!("I was only {:?} days from retirement!", self.data); + } +} + +// Our own custom implementation of Box. +struct MyBox { + data: *mut T, +} + +// This is uninteresting +impl MyBox { + fn new(t: T) -> MyBox { + MyBox { data: Box::into_raw(Box::new(t)) } + } +} + +// The stdlib's Box impl uses may_dangle, so it should be fine for us! +// (This is true... almost) +unsafe impl<#[may_dangle] T> Drop for MyBox { + fn drop(&mut self) { + unsafe { Box::from_raw(self.data); } + } +} + +fn inspect() { + let (data, inspector); + + // We store this in a std box to avoid distractions + data = Box::new(7u8); + + // This time we store an Inspector in our custom Box type + inspector = MyBox::new(Inspector { data: &*data }); + + // !!! If this compiles, the Inspector will read the dangling data here !!! +} +``` + + +**This compiles, and will perform a use-after-free.** + +Something has gone wrong. Just to check, let's replace our use of MyBox with +std's Box. + +```rust,ignore +fn inspect() { + let (data, inspector); + + data = Box::new(7u8); + + // This time we use std's Box type + inspector = Box::new(Inspector { data: &*data }); + + // !!! If this compiles, the Inspector will read the dangling data here !!! +} +``` + +```text +error[E0597]: `*data` does not live long enough + --> src/main.rs:45:1 + | +42 | inspector = Box::new(Inspector { data: &*data }); + | ----- borrow occurs here +... +45 | } + | ^ `*data` dropped here while still borrowed + | + = note: values in a scope are dropped in the opposite order they are created +``` + +Somehow, dropck now notices that we're doing something bad, and catches us. + +Here's the problem: dropck doesn't know that MyBox will drop an Inspector, and +it really needs to know that to perform a proper analysis. To understand the +analysis it's trying to perform, let's step back to the "generic but no +destructor" case: + + +```rust,ignore +// Our own custom implementation of a Box, which doesn't actually box. +struct MyFakeBox { + data: T, +} + +// This is uninteresting +impl MyFakeBox { + fn new(t: T) -> MyFakeBox { + MyFakeBox { data: t } + } +} + +fn inspect() { + let (data, inspector); + + data = Box::new(7u8); + + // This time we store an Inspector in our custom FakeBox type + inspector = MyFakeBox::new(Inspector { data: &*data }); + + // !!! If this compiles, the Inspector will read the dangling data here !!! +} +``` + +``` +error[E0597]: `*data` does not live long enough + --> src/main.rs:37:1 + | +34 | inspector = MyFakeBox::new(Inspector { data: &*data }); + | ----- borrow occurs here +... +37 | } + | ^ `*data` dropped here while still borrowed + | + = note: values in a scope are dropped in the opposite order they are created +``` + +Ok, without a destructor the compiler also performs the right analysis, even +though it should let MyFakeBox contain strictly equal lifetimes when possible. +How does it know that an Inspector will be dropped when a MyFakeBox will be? + +Quite simply: it looks at MyFakeBox's fields. + +Without an explicit destructor, the compiler is the one providing the destructor +implementation, and so it knows exactly what will be dropped: the fields. + +It turns out that this is also the exact analysis that is applied to our MyBox +type. The *problem* is that MyBox stores a `*mut T`, and the compiler knows +dropping a `*mut T` is a noop. So it decides no Inspectors are involved in the +destruction of a MyBox, and lets this code compile (which would be correct if +that conclusion were true). + + + + +# Fixing Your Eyepatches + +The solution to this problem is fairly simple: if the compiler is going to check +our fields for what we drop, let's add some more fields! + +In particular, we will use the PhantomData type to simulate a stored T. +(We'll discuss PhantomData more in the next section. For now, take it for +granted.) + +```rust,ignore +use std::marker::PhantomData; + +// Our own custom implementation of Box. +struct MyBox { + data: *mut T, + _boo: PhantomData, // Tell the compiler we drop a T! +} + +// This is still uninteresting +impl MyBox { + fn new(t: T) -> MyBox { + MyBox { + data: Box::into_raw(Box::new(t)), + _boo: PhantomData, + } + } +} + +// Completely unchanged! We got this part right! +unsafe impl<#[may_dangle] T> Drop for MyBox { + fn drop(&mut self) { + unsafe { Box::from_raw(self.data); } + } +} + +fn inspect() { + let (data, inspector); + + data = Box::new(7u8); + + // Back to our MyBox type + inspector = MyBox::new(Inspector { data: &*data }); + + // !!! If this compiles, the Inspector will read the dangling data here !!! +} +``` + +```text + Compiling playground v0.0.1 (file:///playground) +error[E0597]: `*data` does not live long enough + --> src/main.rs:50:1 + | +47 | inspector = MyBox::new(Inspector { data: &*data }); + | ----- borrow occurs here +... +50 | } + | ^ `*data` dropped here while still borrowed + | + = note: values in a scope are dropped in the opposite order they are created +``` + +Hurray! It worked! + +And just to check that we can still store dangling things when it's sound: + +```rust,ignore +fn inspect() { + let (data, non_inspector); + + data = Box::new(7u8); + + // Our custom box type, but no inspector. + non_inspector = MyBox::new(&*data); +} +``` + +Compiles fine! Great! 🎉 + + + + +# Dropck Eyepatch Summary (TL;DR) + +When a generic type provides a destructor, the compiler will conservatively +disallow any of the type parameters living exactly as long as that type. + +With the dropck eyepatch, we can tell it to ignore certain type parameters +which the destructor only does "trivial" things with. Which is to say, `MyType` +doesn't do anything that `Vec` wouldn't do with `T`. + +However we then also become responsible for telling dropck about all the types +*related* to T that we drop. It knows we will drop anything in our fields, but +things like raw pointers "trick" it, as dropping a raw pointer does nothing. + +To solve this, you should include a `PhantomData` field that stores each of the +types related to T that you may Drop. + +**Note that this includes any associated items that you may drop in the destructor.** + +For instance, if you have a destructor for `MyType` that calls +`into_iter()`, you should probably include `PhantomData`. + +Yes, this is a big hassle and easy to get wrong. Please don't use the eyepatch. + + + + + +[eyepatch]: https://github.com/rust-lang/rfcs/blob/master/text/1327-dropck-param-eyepatch.md diff --git a/src/dropck.md b/src/dropck.md index f1fef35..8829dbe 100644 --- a/src/dropck.md +++ b/src/dropck.md @@ -1,12 +1,8 @@ # Drop Check -We have seen how lifetimes provide us some fairly simple rules for ensuring -that we never read dangling references. However up to this point we have only ever -interacted with the *outlives* relationship in an inclusive manner. That is, -when we talked about `'a: 'b`, it was ok for `'a` to live *exactly* as long as -`'b`. At first glance, this seems to be a meaningless distinction. Nothing ever -gets dropped at the same time as another, right? This is why we used the -following desugaring of `let` statements: +When looking at the *outlives* relationship in previous sections, we never +considered the case where two values have the *exact* same lifetime. We made +this clear by desugarring each let statement into its own scope: ```rust,ignore let x; @@ -22,268 +18,152 @@ let y; } ``` -Each creates its own scope, clearly establishing that one drops before the -other. However, what if we do the following? +But what if we write the following let statement? ```rust,ignore let (x, y) = (vec![], vec![]); ``` Does either value strictly outlive the other? The answer is in fact *no*, -neither value strictly outlives the other. Of course, one of x or y will be -dropped before the other, but the actual order is not specified. Tuples aren't -special in this regard; composite structures just don't guarantee their -destruction order as of Rust 1.0. +neither value strictly outlives the other. At least, as far as the type system +is concerned. -We *could* specify this for the fields of built-in composites like tuples and -structs. However, what about something like Vec? Vec has to manually drop its -elements via pure-library code. In general, anything that implements Drop has -a chance to fiddle with its innards during its final death knell. Therefore -the compiler can't sufficiently reason about the actual destruction order -of the contents of any type that implements Drop. +In actual execution, Rust guarantees that `x` will be dropped before `y`. +This is because they are stored in a composite value (a tuple), and composite +values have their fields destroyed [in declaration order][drop-order]. -So why do we care? We care because if the type system isn't careful, it could -accidentally make dangling pointers. Consider the following simple program: +So why do we care if the compiler considers `x` and `y` to live for the same +amount of time? Well, there's a special trick the compiler can do with equal +lifetimes: it can let us hold onto dangling pointers during destruction! But +we must be careful where we allow this, because any mistake can lead to a +use-after-free. + +Consider the following simple program: ```rust struct Inspector<'a>(&'a u8); fn main() { - let (inspector, days); + let (days, inspector); days = Box::new(1); inspector = Inspector(&days); } ``` -This program is totally sound and compiles today. The fact that `days` does -not *strictly* outlive `inspector` doesn't matter. As long as the `inspector` -is alive, so is days. +This program is perfectly sound, and even compiles today! The fact that `days` +is dropped, and therefore freed, while `inspector` holds a pointer into it doesn't +matter because `inspector` will *also* be destroyed before any code gets a chance +to dereference that dangling pointer. -However if we add a destructor, the program will no longer compile! +Just to make it clear that something special is happening here, this code +(which should behave identically at runtime) *doesn't* compile: ```rust,ignore struct Inspector<'a>(&'a u8); -impl<'a> Drop for Inspector<'a> { - fn drop(&mut self) { - println!("I was only {} days from retirement!", self.0); - } -} - fn main() { - let (inspector, days); + let inspector; + let days; days = Box::new(1); inspector = Inspector(&days); - // Let's say `days` happens to get dropped first. - // Then when Inspector is dropped, it will try to read free'd memory! } ``` ```text error: `days` does not live long enough - --> :15:1 - | -12 | inspector = Inspector(&days); - | ---- borrow occurs here -... -15 | } - | ^ `days` dropped here while still borrowed - | - = note: values in a scope are dropped in the opposite order they are created - -error: aborting due to previous error + --> src/main.rs:8:1 + | +7 | inspector = Inspector(&days); + | ---- borrow occurs here +8 | } + | ^ `days` dropped here while still borrowed + | + = note: values in a scope are dropped in the opposite order they are created ``` -Implementing Drop lets the Inspector execute some arbitrary code during its -death. This means it can potentially observe that types that are supposed to -live as long as it does actually were destroyed first. - -Interestingly, only generic types need to worry about this. If they aren't -generic, then the only lifetimes they can harbor are `'static`, which will truly -live *forever*. This is why this problem is referred to as *sound generic drop*. -Sound generic drop is enforced by the *drop checker*. As of this writing, some -of the finer details of how the drop checker validates types is totally up in -the air. However The Big Rule is the subtlety that we have focused on this whole -section: - -**For a generic type to soundly implement drop, its generics arguments must -strictly outlive it.** - -Obeying this rule is (usually) necessary to satisfy the borrow -checker; obeying it is sufficient but not necessary to be -sound. That is, if your type obeys this rule then it's definitely -sound to drop. +The fact that `inspector` and `days` are stored in the same composite is letting +the compiler apply this special trick. -The reason that it is not always necessary to satisfy the above rule -is that some Drop implementations will not access borrowed data even -though their type gives them the capability for such access. - -For example, this variant of the above `Inspector` example will never -access borrowed data: +Now the *really* interesting part is that if we add a destructor to Inspector, +the program will *also* stop compiling! ```rust,ignore -struct Inspector<'a>(&'a u8, &'static str); +struct Inspector<'a>(&'a u8); impl<'a> Drop for Inspector<'a> { fn drop(&mut self) { - println!("Inspector(_, {}) knows when *not* to inspect.", self.1); - } -} - -fn main() { - let (inspector, days); - days = Box::new(1); - inspector = Inspector(&days, "gadget"); - // Let's say `days` happens to get dropped first. - // Even when Inspector is dropped, its destructor will not access the - // borrowed `days`. -} -``` - -Likewise, this variant will also never access borrowed data: - -```rust,ignore -use std::fmt; - -struct Inspector(T, &'static str); - -impl Drop for Inspector { - fn drop(&mut self) { - println!("Inspector(_, {}) knows when *not* to inspect.", self.1); + println!("I was only {} days from retirement!", self.0); } } fn main() { - let (inspector, days): (Inspector<&u8>, Box); + let (days, inspector); days = Box::new(1); - inspector = Inspector(&days, "gadget"); - // Let's say `days` happens to get dropped first. - // Even when Inspector is dropped, its destructor will not access the - // borrowed `days`. -} -``` - -However, *both* of the above variants are rejected by the borrow -checker during the analysis of `fn main`, saying that `days` does not -live long enough. - -The reason is that the borrow checking analysis of `main` does not -know about the internals of each Inspector's Drop implementation. As -far as the borrow checker knows while it is analyzing `main`, the body -of an inspector's destructor might access that borrowed data. - -Therefore, the drop checker forces all borrowed data in a value to -strictly outlive that value. - -# An Escape Hatch - -The precise rules that govern drop checking may be less restrictive in -the future. - -The current analysis is deliberately conservative and trivial; it forces all -borrowed data in a value to outlive that value, which is certainly sound. - -Future versions of the language may make the analysis more precise, to -reduce the number of cases where sound code is rejected as unsafe. -This would help address cases such as the two Inspectors above that -know not to inspect during destruction. - -In the meantime, there is an unstable attribute that one can use to -assert (unsafely) that a generic type's destructor is *guaranteed* to -not access any expired data, even if its type gives it the capability -to do so. - -That attribute is called `may_dangle` and was introduced in [RFC 1327] -(https://github.com/rust-lang/rfcs/blob/master/text/1327-dropck-param-eyepatch.md). -To deploy it on the Inspector example from above, we would write: - -```rust,ignore -struct Inspector<'a>(&'a u8, &'static str); - -unsafe impl<#[may_dangle] 'a> Drop for Inspector<'a> { - fn drop(&mut self) { - println!("Inspector(_, {}) knows when *not* to inspect.", self.1); - } + inspector = Inspector(&days); + // When Inspector is dropped here, it will try to read free'd memory! } ``` -Use of this attribute requires the `Drop` impl to be marked `unsafe` because the -compiler is not checking the implicit assertion that no potentially expired data -(e.g. `self.0` above) is accessed. - -The attribute can be applied to any number of lifetime and type parameters. In -the following example, we assert that we access no data behind a reference of -lifetime `'b` and that the only uses of `T` will be moves or drops, but omit -the attribute from `'a` and `U`, because we do access data with that lifetime -and that type: - -```rust,ignore -use std::fmt::Display; - -struct Inspector<'a, 'b, T, U: Display>(&'a u8, &'b u8, T, U); +```text +error: `days` does not live long enough + --> :15:1 + | +12 | inspector = Inspector(&days); + | ---- borrow occurs here +... +15 | } + | ^ `days` dropped here while still borrowed + | + = note: values in a scope are dropped in the opposite order they are created -unsafe impl<'a, #[may_dangle] 'b, #[may_dangle] T, U: Display> Drop for Inspector<'a, 'b, T, U> { - fn drop(&mut self) { - println!("Inspector({}, _, _, {})", self.0, self.3); - } -} +error: aborting due to previous error ``` -It is sometimes obvious that no such access can occur, like the case above. -However, when dealing with a generic type parameter, such access can -occur indirectly. Examples of such indirect access are: +Implementing Drop lets the Inspector execute arbitrary code during its +death, which means it can dereference any dangling pointer that it contains. +If we allowed this program to compile, it would perform a use-after-free. - * invoking a callback, - * via a trait method call. +This is the *sound generic drop* issue. We call it that because it only applies +to destructors of generic types. That is, if `Inspector` weren't generic, it +couldn't store any lifetime other than `'static', and dangling pointers would +never be a concern. The enforcement of sound generic drop is handled by the +*drop check*, which is more commonly known as *dropck*. -(Future changes to the language, such as impl specialization, may add -other avenues for such indirect access.) +It turns out that getting dropck's design exactly right has been very difficult. +This is because, as we'll see, we don't want it to give up completely on generic +destructors. In particular: what if we knew `Inspector` *didn't* or even *couldn't* +dereference the pointer in its destructor? Wouldn't it be nice if that meant our +code compiled again? -Here is an example of invoking a callback: +Since Rust 1.0, and as of Rust 1.18, there have been two changes to how dropck +works due to soundness issues. At least one more is planned, as the latest +version was intended to be a temporary hack. -```rust,ignore -struct Inspector(T, &'static str, Box fn(&'r T) -> String>); +* [non-parametric dropck](https://github.com/rust-lang/rfcs/blob/master/text/1238-nonparametric-dropck.md) +* [dropck eyepatch](https://github.com/rust-lang/rfcs/blob/master/text/1327-dropck-param-eyepatch.md) -impl Drop for Inspector { - fn drop(&mut self) { - // The `self.2` call could access a borrow e.g. if `T` is `&'a _`. - println!("Inspector({}, {}) unwittingly inspects expired data.", - (self.2)(&self.0), self.1); - } -} -``` - -Here is an example of a trait method call: +Old versions of this document were based on the original 1.0 design, which +was unsafe by default, and therefore expected unsafe Rust programmers to manually +opt out of it. -```rust,ignore -use std::fmt; +The good news is that, unlike the 1.0 design, the 1.18 design is *safe by default*: +you can completely ignore that dropck exists, and nothing bad can be done with +your types. But sometimes you will be able to leverage your knowledge of dropck +to make transiently dangling references work, and that's nice. -struct Inspector(T, &'static str); +So here's all you should need to know about dropck these days: -impl Drop for Inspector { - fn drop(&mut self) { - // There is a hidden call to `::fmt` below, which - // could access a borrow e.g. if `T` is `&'a _` - println!("Inspector({}, {}) unwittingly inspects expired data.", - self.0, self.1); - } -} -``` +* If a type isn't generic, then it cannot contain borrows that expire, and + is therefore uninteresting. +* If a type is generic, and doesn't have a destructor, its generic arguments + must must live *at least* as long as it. +* If a type is generic, and *does* have a destructor, its generic arguments + must live *strictly* longer than it. -And of course, all of these accesses could be further hidden within -some other method invoked by the destructor, rather than being written -directly within it. +Or to put it another way: if you want to be able to store references to things +that live **exactly** as long as yourself, you can't have a destructor. -In all of the above cases where the `&'a u8` is accessed in the -destructor, adding the `#[may_dangle]` -attribute makes the type vulnerable to misuse that the borrower -checker will not catch, inviting havoc. It is better to avoid adding -the attribute. -# Is that all about drop checker? -It turns out that when writing unsafe code, we generally don't need to -worry at all about doing the right thing for the drop checker. However there -is one special case that you need to worry about, which we will look at in -the next section. +[drop-order]: https://github.com/rust-lang/rfcs/blob/master/text/1857-stabilize-drop-order.md diff --git a/src/phantom-data.md b/src/phantom-data.md index 32539c2..f9ffb68 100644 --- a/src/phantom-data.md +++ b/src/phantom-data.md @@ -21,8 +21,13 @@ correct variance and drop checking. We do this using `PhantomData`, which is a special marker type. `PhantomData` consumes no space, but simulates a field of the given type for the purpose of static analysis. This was deemed to be less error-prone than explicitly telling -the type-system the kind of variance that you want, while also providing other -useful such as the information needed by drop check. +the type system the kind of variance that you want, while also being useful +for secondary concerns like deriving Send and Sync. + +When using the *drop check eyepatch*, PhantomData also becomes important for +telling the compiler about all types that you drop that it can't see. See the +[the previous section][dropck-eyepatch] for details. This can be ignored if you +don't know what the eyepatch is. Iter logically contains a bunch of `&'a T`s, so this is exactly what we tell the PhantomData to simulate: @@ -40,65 +45,39 @@ struct Iter<'a, T: 'a> { and that's it. The lifetime will be bounded, and your iterator will be variant over `'a` and `T`. Everything Just Works. -Another important example is Vec, which is (approximately) defined as follows: - -``` -struct Vec { - data: *const T, // *const for variance! - len: usize, - cap: usize, -} -``` - -Unlike the previous example, it *appears* that everything is exactly as we -want. Every generic argument to Vec shows up in at least one field. -Good to go! - -Nope. - -The drop checker will generously determine that `Vec` does not own any values -of type T. This will in turn make it conclude that it doesn't need to worry -about Vec dropping any T's in its destructor for determining drop check -soundness. This will in turn allow people to create unsoundness using -Vec's destructor. - -In order to tell dropck that we *do* own values of type T, and therefore may -drop some T's when *we* drop, we must add an extra PhantomData saying exactly -that: +Here's a more extreme example based on HashMap which stores a single opaque +allocation which is used for multiple arrays of different types: ``` use std::marker; -struct Vec { - data: *const T, // *const for covariance! - len: usize, - cap: usize, - _marker: marker::PhantomData, +struct HashMap { + ptr: *mut u8, + // The pointer actually stores keys and values + // (and hashes, but those aren't generic) + _marker: marker::PhantomData<(K, V)>, } ``` -Raw pointers that own an allocation is such a pervasive pattern that the -standard library made a utility for itself called `Unique` which: - -* wraps a `*const T` for variance -* includes a `PhantomData` -* auto-derives `Send`/`Sync` as if T was contained -* marks the pointer as `NonZero` for the null-pointer optimization ## Table of `PhantomData` patterns -Here’s a table of all the wonderful ways `PhantomData` could be used: +Here’s a table of all the most common ways `PhantomData` is used: | Phantom type | `'a` | `T` | |-----------------------------|-----------|---------------------------| -| `PhantomData` | - | variant (with drop check) | +| `PhantomData` | - | variant (and drop check T)| | `PhantomData<&'a T>` | variant | variant | | `PhantomData<&'a mut T>` | variant | invariant | | `PhantomData<*const T>` | - | variant | | `PhantomData<*mut T>` | - | invariant | -| `PhantomData` | - | contravariant (*) | +| `PhantomData` | - | contravariant | | `PhantomData T>` | - | variant | | `PhantomData T>` | - | invariant | | `PhantomData>` | invariant | - | -(*) If contravariance gets scrapped, this would be invariant. + + + + +[dropck-eyepatch]: dropck-eyepatch.html diff --git a/src/vec-alloc.md b/src/vec-alloc.md index 349cb50..7d8f064 100644 --- a/src/vec-alloc.md +++ b/src/vec-alloc.md @@ -1,15 +1,15 @@ # Allocating Memory -Using Unique throws a wrench in an important feature of Vec (and indeed all of +Using Shared throws a wrench in an important feature of Vec (and indeed all of the std collections): an empty Vec doesn't actually allocate at all. So if we can't allocate, but also can't put a null pointer in `ptr`, what do we do in `Vec::new`? Well, we just put some other garbage in there! This is perfectly fine because we already have `cap == 0` as our sentinel for no -allocation. We don't even need to handle it specially in almost any code because +allocation. We don't need to handle it specially in almost any code because we usually need to check if `cap > len` or `len > 0` anyway. The recommended -Rust value to put here is `mem::align_of::()`. Unique provides a convenience -for this: `Unique::empty()`. There are quite a few places where we'll +Rust value to put here is `mem::align_of::()`. Shared provides a convenience +for this: `Shared::empty()`. There are quite a few places where we'll want to use `empty` because there's no real allocation to talk about but `null` would make the compiler do bad things. @@ -23,7 +23,7 @@ use std::mem; impl Vec { fn new() -> Self { assert!(mem::size_of::() != 0, "We're not ready to handle ZSTs"); - Vec { ptr: Unique::empty(), len: 0, cap: 0 } + Vec { ptr: Shared::empty(), len: 0, cap: 0 } } } ``` @@ -202,7 +202,7 @@ fn grow(&mut self) { // If allocate or reallocate fail, we'll get `null` back if ptr.is_null() { oom(); } - self.ptr = Unique::new(ptr as *mut _); + self.ptr = Shared::new(ptr as *mut _); self.cap = new_cap; } } diff --git a/src/vec-final.md b/src/vec-final.md index a534bcf..51dc2ed 100644 --- a/src/vec-final.md +++ b/src/vec-final.md @@ -4,14 +4,14 @@ #![feature(unique)] #![feature(allocator_api)] -use std::ptr::{Unique, self}; +use std::ptr::{Shared, self}; use std::mem; use std::ops::{Deref, DerefMut}; use std::marker::PhantomData; use std::heap::{Alloc, Layout, Heap}; struct RawVec { - ptr: Unique, + ptr: Shared, cap: usize, } @@ -20,8 +20,8 @@ impl RawVec { // !0 is usize::MAX. This branch should be stripped at compile time. let cap = if mem::size_of::() == 0 { !0 } else { 0 }; - // Unique::empty() doubles as "unallocated" and "zero-sized allocation" - RawVec { ptr: Unique::empty(), cap: cap } + // Shared::empty() doubles as "unallocated" and "zero-sized allocation" + RawVec { ptr: Shared::empty(), cap: cap } } fn grow(&mut self) { @@ -49,7 +49,7 @@ impl RawVec { Err(err) => Heap.oom(err), }; - self.ptr = Unique::new_unchecked(ptr as *mut _); + self.ptr = Shared::new_unchecked(ptr as *mut _); self.cap = new_cap; } } diff --git a/src/vec-into-iter.md b/src/vec-into-iter.md index df36757..1780071 100644 --- a/src/vec-into-iter.md +++ b/src/vec-into-iter.md @@ -44,7 +44,7 @@ So we're going to use the following struct: ```rust,ignore struct IntoIter { - buf: Unique, + buf: Shared, cap: usize, start: *const T, end: *const T, diff --git a/src/vec-layout.md b/src/vec-layout.md index 795f1ac..89ad997 100644 --- a/src/vec-layout.md +++ b/src/vec-layout.md @@ -15,68 +15,41 @@ pub struct Vec { # fn main() {} ``` -And indeed this would compile. Unfortunately, it would be incorrect. First, the -compiler will give us too strict variance. So a `&Vec<&'static str>` -couldn't be used where an `&Vec<&'a str>` was expected. More importantly, it -will give incorrect ownership information to the drop checker, as it will -conservatively assume we don't own any values of type `T`. See [the chapter -on ownership and lifetimes][ownership] for all the details on variance and -drop check. +And indeed this would compile and work correctly. However it comes with a semantic +limitation and a missed optimization opportunity. -As we saw in the ownership chapter, we should use `Unique` in place of -`*mut T` when we have a raw pointer to an allocation we own. Unique is unstable, -so we'd like to not use it if possible, though. +In terms of semantics, this implementation of Vec would be [invariant over T][variance]. +So a `&Vec<&'static str>` couldn't be used where an `&Vec<&'a str>` was expected. -As a recap, Unique is a wrapper around a raw pointer that declares that: +In terms of optimization, this implementation of Vec wouldn't be eligible for the +*null pointer optimization*, meaning `Option>` would take up more space +than `Vec`. -* We are variant over `T` -* We may own a value of type `T` (for drop check) -* We are Send/Sync if `T` is Send/Sync -* Our pointer is never null (so `Option>` is null-pointer-optimized) +These are fairly common problems because the raw pointer types in Rust aren't +very well optimized for this use-case. They're more tuned to make it easier to +express C APIs. This is why the standard library provides a pointer type that +better matches the semantics pure-Rust abstractions want: `Shared`. -We can implement all of the above requirements except for the last -one in stable Rust: +Compared to `*mut T`, `Shared` provides three benefits: -```rust -use std::marker::PhantomData; -use std::ops::Deref; -use std::mem; - -struct Unique { - ptr: *const T, // *const for variance - _marker: PhantomData, // For the drop checker -} - -// Deriving Send and Sync is safe because we are the Unique owners -// of this data. It's like Unique is "just" T. -unsafe impl Send for Unique {} -unsafe impl Sync for Unique {} - -impl Unique { - pub fn new(ptr: *mut T) -> Self { - Unique { ptr: ptr, _marker: PhantomData } - } +* Variant over `T` (dangerous in general, but desirable for collections) +* Null-pointer optimizes (so `Option>` is pointer-sized) - pub fn as_ptr(&self) -> *mut T { - self.ptr as *mut T - } -} +We could get the variance requirement ourselves using `*const T` and casts, but +the API for expressing a value is non-zero is unstable, and that isn't expected +to change any time soon. -# fn main() {} -``` - -Unfortunately the mechanism for stating that your value is non-zero is -unstable and unlikely to be stabilized soon. As such we're just going to -take the hit and use std's Unique: +Shared should be stabilized in some form very soon, so we're just going to use +that. ```rust -#![feature(unique)] +#![feature(shared)] -use std::ptr::{Unique, self}; +use std::ptr::Shared; pub struct Vec { - ptr: Unique, + ptr: Shared, cap: usize, len: usize, } @@ -84,11 +57,14 @@ pub struct Vec { # fn main() {} ``` -If you don't care about the null-pointer optimization, then you can use the -stable code. However we will be designing the rest of the code around enabling -this optimization. It should be noted that `Unique::new` is unsafe to call, because -putting `null` inside of it is Undefined Behavior. Our stable Unique doesn't -need `new` to be unsafe because it doesn't make any interesting guarantees about -its contents. +If you don't care about the null-pointer optimization, then you can use `*const T`. +For most code, using `*mut T` would also be perfectly reasonable. +However this chapter is focused on providing an implementation that matches the +quality of the one in the standard library, so we will be designing the rest of +the code around using Shared. + +Lastly, it should be noted that `Shared::new` is unsafe to call, because +putting `null` inside of it is Undefined Behavior. Code that doesn't use Shared +has no such concern. -[ownership]: ownership.html +[variance]: variance.html diff --git a/src/vec-raw.md b/src/vec-raw.md index ad24b61..c352d70 100644 --- a/src/vec-raw.md +++ b/src/vec-raw.md @@ -11,14 +11,14 @@ allocating, growing, and freeing: ```rust,ignore struct RawVec { - ptr: Unique, + ptr: Shared, cap: usize, } impl RawVec { fn new() -> Self { assert!(mem::size_of::() != 0, "TODO: implement ZST support"); - RawVec { ptr: Unique::empty(), cap: 0 } + RawVec { ptr: Shared::empty(), cap: 0 } } // unchanged from Vec @@ -42,7 +42,7 @@ impl RawVec { // If allocate or reallocate fail, we'll get `null` back if ptr.is_null() { oom() } - self.ptr = Unique::new(ptr as *mut _); + self.ptr = Shared::new(ptr as *mut _); self.cap = new_cap; } } diff --git a/src/vec-zsts.md b/src/vec-zsts.md index 7334404..74e32a1 100644 --- a/src/vec-zsts.md +++ b/src/vec-zsts.md @@ -19,7 +19,7 @@ RawValIter and RawVec respectively. How mysteriously convenient. ## Allocating Zero-Sized Types So if the allocator API doesn't support zero-sized allocations, what on earth -do we store as our allocation? `Unique::empty()` of course! Almost every operation +do we store as our allocation? `Shared::empty()` of course! Almost every operation with a ZST is a no-op since ZSTs have exactly one value, and therefore no state needs to be considered to store or load them. This actually extends to `ptr::read` and `ptr::write`: they won't actually look at the pointer at all. As such we never need @@ -38,8 +38,8 @@ impl RawVec { // !0 is usize::MAX. This branch should be stripped at compile time. let cap = if mem::size_of::() == 0 { !0 } else { 0 }; - // Unique::empty() doubles as "unallocated" and "zero-sized allocation" - RawVec { ptr: Unique::empty(), cap: cap } + // Shared::empty() doubles as "unallocated" and "zero-sized allocation" + RawVec { ptr: Shared::empty(), cap: cap } } fn grow(&mut self) { @@ -67,7 +67,7 @@ impl RawVec { // If allocate or reallocate fail, we'll get `null` back if ptr.is_null() { oom() } - self.ptr = Unique::new(ptr as *mut _); + self.ptr = Shared::new(ptr as *mut _); self.cap = new_cap; } }