mirror of https://github.com/rust-lang/nomicon
parent
df793ee850
commit
745dcebe39
@ -0,0 +1,127 @@
|
|||||||
|
% Drop Check
|
||||||
|
|
||||||
|
We have seen how lifetimes provide us some fairly simple rules for ensuring
|
||||||
|
that never read dangling references. However up to this point we have only ever
|
||||||
|
interacted with the *outlives* relationship in an inclusive manner. That is,
|
||||||
|
when we talked about `'a: 'b`, it was ok for `'a` to live *exactly* as long as
|
||||||
|
`'b`. At first glance, this seems to be a meaningless distinction. Nothing ever
|
||||||
|
gets dropped at the same time as another, right? This is why we used the
|
||||||
|
following desugarring of `let` statements:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
let x;
|
||||||
|
let y;
|
||||||
|
```
|
||||||
|
|
||||||
|
```rust
|
||||||
|
{
|
||||||
|
let x;
|
||||||
|
{
|
||||||
|
let y;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Each creates its own scope, clearly establishing that one drops before the
|
||||||
|
other. However, what if we do the following?
|
||||||
|
|
||||||
|
```rust
|
||||||
|
let (x, y) = (vec![], vec![]);
|
||||||
|
```
|
||||||
|
|
||||||
|
Does either value strictly outlive the other? The answer is in fact *no*,
|
||||||
|
neither value strictly outlives the other. Of course, one of x or y will be
|
||||||
|
dropped before the other, but the actual order is not specified. Tuples aren't
|
||||||
|
special in this regard; composite structures just don't guarantee their
|
||||||
|
destruction order as of Rust 1.0.
|
||||||
|
|
||||||
|
We *could* specify this for the fields of built-in composites like tuples and
|
||||||
|
structs. However, what about something like Vec? Vec has to manually drop its
|
||||||
|
elements via pure-library code. In general, anything that implements Drop has
|
||||||
|
a chance to fiddle with its innards during its final death knell. Therefore
|
||||||
|
the compiler can't sufficiently reason about the actual destruction order
|
||||||
|
of the contents of any type that implements Drop.
|
||||||
|
|
||||||
|
So why do we care? We care because if the type system isn't careful, it could
|
||||||
|
accidentally make dangling pointers. Consider the following simple program:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
struct Inspector<'a>(&'a u8);
|
||||||
|
|
||||||
|
fn main() {
|
||||||
|
let (days, inspector);
|
||||||
|
days = Box::new(1);
|
||||||
|
inspector = Inspector(&days);
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
This program is totally sound and compiles today. The fact that `days` does
|
||||||
|
not *strictly* outlive `inspector` doesn't matter. As long as the `inspector`
|
||||||
|
is alive, so is days.
|
||||||
|
|
||||||
|
However if we add a destructor, the program will no longer compile!
|
||||||
|
|
||||||
|
```rust,ignore
|
||||||
|
struct Inspector<'a>(&'a u8);
|
||||||
|
|
||||||
|
impl<'a> Drop for Inspector<'a> {
|
||||||
|
fn drop(&mut self) {
|
||||||
|
println!("I was only {} days from retirement!", self.0);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
fn main() {
|
||||||
|
let (days, inspector);
|
||||||
|
days = Box::new(1);
|
||||||
|
inspector = Inspector(&days);
|
||||||
|
// Let's say `days` happens to get dropped first.
|
||||||
|
// Then when Inspector is dropped, it will try to read free'd memory!
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
```text
|
||||||
|
<anon>:12:28: 12:32 error: `days` does not live long enough
|
||||||
|
<anon>:12 inspector = Inspector(&days);
|
||||||
|
^~~~
|
||||||
|
<anon>:9:11: 15:2 note: reference must be valid for the block at 9:10...
|
||||||
|
<anon>:9 fn main() {
|
||||||
|
<anon>:10 let (days, inspector);
|
||||||
|
<anon>:11 days = Box::new(1);
|
||||||
|
<anon>:12 inspector = Inspector(&days);
|
||||||
|
<anon>:13 // Let's say `days` happens to get dropped first.
|
||||||
|
<anon>:14 // Then when Inspector is dropped, it will try to read free'd memory!
|
||||||
|
...
|
||||||
|
<anon>:10:27: 15:2 note: ...but borrowed value is only valid for the block suffix following statement 0 at 10:26
|
||||||
|
<anon>:10 let (days, inspector);
|
||||||
|
<anon>:11 days = Box::new(1);
|
||||||
|
<anon>:12 inspector = Inspector(&days);
|
||||||
|
<anon>:13 // Let's say `days` happens to get dropped first.
|
||||||
|
<anon>:14 // Then when Inspector is dropped, it will try to read free'd memory!
|
||||||
|
<anon>:15 }
|
||||||
|
```
|
||||||
|
|
||||||
|
Implementing Drop lets the Inspector execute some arbitrary code *during* its
|
||||||
|
death. This means it can potentially observe that types that are supposed to
|
||||||
|
live as long as it does actually were destroyed first.
|
||||||
|
|
||||||
|
Interestingly, only *generic* types need to worry about this. If they aren't
|
||||||
|
generic, then the only lifetimes they can harbor are `'static`, which will truly
|
||||||
|
live *forever*. This is why this problem is referred to as *sound generic drop*.
|
||||||
|
Sound generic drop is enforced by the *drop checker*. As of this writing, some
|
||||||
|
of the finer details of how the drop checker validates types is totally up in
|
||||||
|
the air. However The Big Rule is the subtlety that we have focused on this whole
|
||||||
|
section:
|
||||||
|
|
||||||
|
**For a generic type to soundly implement drop, it must strictly outlive all of
|
||||||
|
its generic arguments.**
|
||||||
|
|
||||||
|
This rule is sufficient but not necessary to satisfy the drop checker. That is,
|
||||||
|
if your type obeys this rule then it's *definitely* sound to drop. However
|
||||||
|
there are special cases where you can fail to satisfy this, but still
|
||||||
|
successfully pass the borrow checker. These are the precise rules that are
|
||||||
|
currently up in the air.
|
||||||
|
|
||||||
|
It turns out that when writing unsafe code, we generally don't need to
|
||||||
|
worry at all about doing the right thing for the drop checker. However there
|
||||||
|
is *one* special case that you need to worry about, which we will look at in
|
||||||
|
the next section.
|
@ -0,0 +1,140 @@
|
|||||||
|
% Splitting Lifetimes
|
||||||
|
|
||||||
|
The mutual exclusion property of mutable references can be very limiting when
|
||||||
|
working with a composite structure. The borrow checker understands some basic stuff, but
|
||||||
|
will fall over pretty easily. It *does* understand structs sufficiently to
|
||||||
|
know that it's possible to borrow disjoint fields of a struct simultaneously.
|
||||||
|
So this works today:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
struct Foo {
|
||||||
|
a: i32,
|
||||||
|
b: i32,
|
||||||
|
c: i32,
|
||||||
|
}
|
||||||
|
|
||||||
|
let mut x = Foo {a: 0, b: 0, c: 0};
|
||||||
|
let a = &mut x.a;
|
||||||
|
let b = &mut x.b;
|
||||||
|
let c = &x.c;
|
||||||
|
*b += 1;
|
||||||
|
let c2 = &x.c;
|
||||||
|
*a += 10;
|
||||||
|
println!("{} {} {} {}", a, b, c, c2);
|
||||||
|
```
|
||||||
|
|
||||||
|
However borrowck doesn't understand arrays or slices in any way, so this doesn't
|
||||||
|
work:
|
||||||
|
|
||||||
|
```rust,ignore
|
||||||
|
let x = [1, 2, 3];
|
||||||
|
let a = &mut x[0];
|
||||||
|
let b = &mut x[1];
|
||||||
|
println!("{} {}", a, b);
|
||||||
|
```
|
||||||
|
|
||||||
|
```text
|
||||||
|
<anon>:3:18: 3:22 error: cannot borrow immutable indexed content `x[..]` as mutable
|
||||||
|
<anon>:3 let a = &mut x[0];
|
||||||
|
^~~~
|
||||||
|
<anon>:4:18: 4:22 error: cannot borrow immutable indexed content `x[..]` as mutable
|
||||||
|
<anon>:4 let b = &mut x[1];
|
||||||
|
^~~~
|
||||||
|
error: aborting due to 2 previous errors
|
||||||
|
```
|
||||||
|
|
||||||
|
While it was plausible that borrowck could understand this simple case, it's
|
||||||
|
pretty clearly hopeless for borrowck to understand disjointness in general
|
||||||
|
container types like a tree, especially if distinct keys actually *do* map
|
||||||
|
to the same value.
|
||||||
|
|
||||||
|
In order to "teach" borrowck that what we're doing is ok, we need to drop down
|
||||||
|
to unsafe code. For instance, mutable slices expose a `split_at_mut` function that
|
||||||
|
consumes the slice and returns *two* mutable slices. One for everything to the
|
||||||
|
left of the index, and one for everything to the right. Intuitively we know this
|
||||||
|
is safe because the slices don't alias. However the implementation requires some
|
||||||
|
unsafety:
|
||||||
|
|
||||||
|
```rust,ignore
|
||||||
|
fn split_at_mut(&mut self, mid: usize) -> (&mut [T], &mut [T]) {
|
||||||
|
unsafe {
|
||||||
|
let self2: &mut [T] = mem::transmute_copy(&self);
|
||||||
|
|
||||||
|
(ops::IndexMut::index_mut(self, ops::RangeTo { end: mid } ),
|
||||||
|
ops::IndexMut::index_mut(self2, ops::RangeFrom { start: mid } ))
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
This is pretty plainly dangerous. We use transmute to duplicate the slice with an
|
||||||
|
*unbounded* lifetime, so that it can be treated as disjoint from the other until
|
||||||
|
we unify them when we return.
|
||||||
|
|
||||||
|
However more subtle is how iterators that yield mutable references work.
|
||||||
|
The iterator trait is defined as follows:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
trait Iterator {
|
||||||
|
type Item;
|
||||||
|
|
||||||
|
fn next(&mut self) -> Option<Self::Item>;
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Given this definition, Self::Item has *no* connection to `self`. This means
|
||||||
|
that we can call `next` several times in a row, and hold onto all the results
|
||||||
|
*concurrently*. This is perfectly fine for by-value iterators, which have exactly
|
||||||
|
these semantics. It's also actually fine for shared references, as they admit
|
||||||
|
arbitrarily many references to the same thing (although the
|
||||||
|
iterator needs to be a separate object from the thing being shared). But mutable
|
||||||
|
references make this a mess. At first glance, they might seem completely
|
||||||
|
incompatible with this API, as it would produce multiple mutable references to
|
||||||
|
the same object!
|
||||||
|
|
||||||
|
However it actually *does* work, exactly because iterators are one-shot objects.
|
||||||
|
Everything an IterMut yields will be yielded *at most* once, so we don't *actually*
|
||||||
|
ever yield multiple mutable references to the same piece of data.
|
||||||
|
|
||||||
|
In general all mutable iterators require *some* unsafe code *somewhere*, though.
|
||||||
|
Whether it's raw pointers, or safely composing on top of *another* IterMut.
|
||||||
|
|
||||||
|
For instance, VecDeque's IterMut:
|
||||||
|
|
||||||
|
```rust,ignore
|
||||||
|
struct IterMut<'a, T:'a> {
|
||||||
|
// The whole backing array. Some of these indices are initialized!
|
||||||
|
ring: &'a mut [T],
|
||||||
|
tail: usize,
|
||||||
|
head: usize,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl<'a, T> Iterator for IterMut<'a, T> {
|
||||||
|
type Item = &'a mut T;
|
||||||
|
|
||||||
|
fn next(&mut self) -> Option<&'a mut T> {
|
||||||
|
if self.tail == self.head {
|
||||||
|
return None;
|
||||||
|
}
|
||||||
|
let tail = self.tail;
|
||||||
|
self.tail = wrap_index(self.tail.wrapping_add(1), self.ring.len());
|
||||||
|
|
||||||
|
unsafe {
|
||||||
|
// might as well do unchecked indexing since wrap_index has us
|
||||||
|
// in-bounds, and many of the "middle" indices are uninitialized
|
||||||
|
// anyway.
|
||||||
|
let elem = self.ring.get_unchecked_mut(tail);
|
||||||
|
|
||||||
|
// round-trip through a raw pointer to unbound the lifetime from
|
||||||
|
// ourselves
|
||||||
|
Some(&mut *(elem as *mut _))
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
A very subtle but interesting detail in this design is that it *relies on
|
||||||
|
privacy to be sound*. Borrowck works on some very simple rules. One of those rules
|
||||||
|
is that if we have a live &mut Foo and Foo contains an &mut Bar, then that &mut
|
||||||
|
Bar is *also* live. Since IterMut is always live when `next` can be called, if
|
||||||
|
`ring` were public then we could mutate `ring` while outstanding mutable borrows
|
||||||
|
to it exist!
|
@ -0,0 +1,87 @@
|
|||||||
|
% PhantomData
|
||||||
|
|
||||||
|
When working with unsafe code, we can often end up in a situation where
|
||||||
|
types or lifetimes are logically associated with a struct, but not actually
|
||||||
|
part of a field. This most commonly occurs with lifetimes. For instance, the
|
||||||
|
`Iter` for `&'a [T]` is (approximately) defined as follows:
|
||||||
|
|
||||||
|
```rust,ignore
|
||||||
|
struct Iter<'a, T: 'a> {
|
||||||
|
ptr: *const T,
|
||||||
|
end: *const T,
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
However because `'a` is unused within the struct's body, it's *unbounded*.
|
||||||
|
Because of the troubles this has historically caused, unbounded lifetimes and
|
||||||
|
types are *illegal* in struct definitions. Therefore we must somehow refer
|
||||||
|
to these types in the body. Correctly doing this is necessary to have
|
||||||
|
correct variance and drop checking.
|
||||||
|
|
||||||
|
We do this using *PhantomData*, which is a special marker type. PhantomData
|
||||||
|
consumes no space, but simulates a field of the given type for the purpose of
|
||||||
|
static analysis. This was deemed to be less error-prone than explicitly telling
|
||||||
|
the type-system the kind of variance that you want, while also providing other
|
||||||
|
useful such as the information needed by drop check.
|
||||||
|
|
||||||
|
Iter logically contains a bunch of `&'a T`s, so this is exactly what we tell
|
||||||
|
the PhantomData to simulate:
|
||||||
|
|
||||||
|
```
|
||||||
|
use std::marker;
|
||||||
|
|
||||||
|
struct Iter<'a, T: 'a> {
|
||||||
|
ptr: *const T,
|
||||||
|
end: *const T,
|
||||||
|
_marker: marker::PhantomData<&'a T>,
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
and that's it. The lifetime will be bounded, and your iterator will be variant
|
||||||
|
over `'a` and `T`. Everything Just Works.
|
||||||
|
|
||||||
|
Another important example is Vec, which is (approximately) defined as follows:
|
||||||
|
|
||||||
|
```
|
||||||
|
struct Vec<T> {
|
||||||
|
data: *const T, // *const for variance!
|
||||||
|
len: usize,
|
||||||
|
cap: usize,
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Unlike the previous example it *appears* that everything is exactly as we
|
||||||
|
want. Every generic argument to Vec shows up in the at least one field.
|
||||||
|
Good to go!
|
||||||
|
|
||||||
|
Nope.
|
||||||
|
|
||||||
|
The drop checker will generously determine that Vec<T> does not own any values
|
||||||
|
of type T. This will in turn make it conclude that it does *not* need to worry
|
||||||
|
about Vec dropping any T's in its destructor for determining drop check
|
||||||
|
soundness. This will in turn allow people to create unsoundness using
|
||||||
|
Vec's destructor.
|
||||||
|
|
||||||
|
In order to tell dropck that we *do* own values of type T, and therefore may
|
||||||
|
drop some T's when *we* drop, we must add an extra PhantomData saying exactly
|
||||||
|
that:
|
||||||
|
|
||||||
|
```
|
||||||
|
use std::marker;
|
||||||
|
|
||||||
|
struct Vec<T> {
|
||||||
|
data: *const T, // *const for covariance!
|
||||||
|
len: usize,
|
||||||
|
cap: usize,
|
||||||
|
_marker: marker::PhantomData<T>,
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Raw pointers that own an allocation is such a pervasive pattern that the
|
||||||
|
standard library made a utility for itself called `Unique<T>` which:
|
||||||
|
|
||||||
|
* wraps a `*const T` for variance
|
||||||
|
* includes a `PhantomData<T>`,
|
||||||
|
* auto-derives Send/Sync as if T was contained
|
||||||
|
* marks the pointer as NonZero for the null-pointer optimization
|
||||||
|
|
Loading…
Reference in new issue