shard out misc section on lifetimes properly

pull/10/head
Alexis Beingessner 9 years ago committed by Manish Goregaokar
parent df793ee850
commit 745dcebe39

@ -15,7 +15,9 @@
* [Unbounded Lifetimes](unbounded-lifetimes.md)
* [Higher-Rank Trait Bounds](hrtb.md)
* [Subtyping and Variance](subtyping.md)
* [Misc](lifetime-misc.md)
* [Drop Check](dropck.md)
* [PhantomData](phantom-data.md)
* [Splitting Lifetimes](lifetime-splitting.md)
* [Type Conversions](conversions.md)
* [Coercions](coercions.md)
* [The Dot Operator](dot-operator.md)

@ -0,0 +1,127 @@
% Drop Check
We have seen how lifetimes provide us some fairly simple rules for ensuring
that never read dangling references. However up to this point we have only ever
interacted with the *outlives* relationship in an inclusive manner. That is,
when we talked about `'a: 'b`, it was ok for `'a` to live *exactly* as long as
`'b`. At first glance, this seems to be a meaningless distinction. Nothing ever
gets dropped at the same time as another, right? This is why we used the
following desugarring of `let` statements:
```rust
let x;
let y;
```
```rust
{
let x;
{
let y;
}
}
```
Each creates its own scope, clearly establishing that one drops before the
other. However, what if we do the following?
```rust
let (x, y) = (vec![], vec![]);
```
Does either value strictly outlive the other? The answer is in fact *no*,
neither value strictly outlives the other. Of course, one of x or y will be
dropped before the other, but the actual order is not specified. Tuples aren't
special in this regard; composite structures just don't guarantee their
destruction order as of Rust 1.0.
We *could* specify this for the fields of built-in composites like tuples and
structs. However, what about something like Vec? Vec has to manually drop its
elements via pure-library code. In general, anything that implements Drop has
a chance to fiddle with its innards during its final death knell. Therefore
the compiler can't sufficiently reason about the actual destruction order
of the contents of any type that implements Drop.
So why do we care? We care because if the type system isn't careful, it could
accidentally make dangling pointers. Consider the following simple program:
```rust
struct Inspector<'a>(&'a u8);
fn main() {
let (days, inspector);
days = Box::new(1);
inspector = Inspector(&days);
}
```
This program is totally sound and compiles today. The fact that `days` does
not *strictly* outlive `inspector` doesn't matter. As long as the `inspector`
is alive, so is days.
However if we add a destructor, the program will no longer compile!
```rust,ignore
struct Inspector<'a>(&'a u8);
impl<'a> Drop for Inspector<'a> {
fn drop(&mut self) {
println!("I was only {} days from retirement!", self.0);
}
}
fn main() {
let (days, inspector);
days = Box::new(1);
inspector = Inspector(&days);
// Let's say `days` happens to get dropped first.
// Then when Inspector is dropped, it will try to read free'd memory!
}
```
```text
<anon>:12:28: 12:32 error: `days` does not live long enough
<anon>:12 inspector = Inspector(&days);
^~~~
<anon>:9:11: 15:2 note: reference must be valid for the block at 9:10...
<anon>:9 fn main() {
<anon>:10 let (days, inspector);
<anon>:11 days = Box::new(1);
<anon>:12 inspector = Inspector(&days);
<anon>:13 // Let's say `days` happens to get dropped first.
<anon>:14 // Then when Inspector is dropped, it will try to read free'd memory!
...
<anon>:10:27: 15:2 note: ...but borrowed value is only valid for the block suffix following statement 0 at 10:26
<anon>:10 let (days, inspector);
<anon>:11 days = Box::new(1);
<anon>:12 inspector = Inspector(&days);
<anon>:13 // Let's say `days` happens to get dropped first.
<anon>:14 // Then when Inspector is dropped, it will try to read free'd memory!
<anon>:15 }
```
Implementing Drop lets the Inspector execute some arbitrary code *during* its
death. This means it can potentially observe that types that are supposed to
live as long as it does actually were destroyed first.
Interestingly, only *generic* types need to worry about this. If they aren't
generic, then the only lifetimes they can harbor are `'static`, which will truly
live *forever*. This is why this problem is referred to as *sound generic drop*.
Sound generic drop is enforced by the *drop checker*. As of this writing, some
of the finer details of how the drop checker validates types is totally up in
the air. However The Big Rule is the subtlety that we have focused on this whole
section:
**For a generic type to soundly implement drop, it must strictly outlive all of
its generic arguments.**
This rule is sufficient but not necessary to satisfy the drop checker. That is,
if your type obeys this rule then it's *definitely* sound to drop. However
there are special cases where you can fail to satisfy this, but still
successfully pass the borrow checker. These are the precise rules that are
currently up in the air.
It turns out that when writing unsafe code, we generally don't need to
worry at all about doing the right thing for the drop checker. However there
is *one* special case that you need to worry about, which we will look at in
the next section.

@ -0,0 +1,140 @@
% Splitting Lifetimes
The mutual exclusion property of mutable references can be very limiting when
working with a composite structure. The borrow checker understands some basic stuff, but
will fall over pretty easily. It *does* understand structs sufficiently to
know that it's possible to borrow disjoint fields of a struct simultaneously.
So this works today:
```rust
struct Foo {
a: i32,
b: i32,
c: i32,
}
let mut x = Foo {a: 0, b: 0, c: 0};
let a = &mut x.a;
let b = &mut x.b;
let c = &x.c;
*b += 1;
let c2 = &x.c;
*a += 10;
println!("{} {} {} {}", a, b, c, c2);
```
However borrowck doesn't understand arrays or slices in any way, so this doesn't
work:
```rust,ignore
let x = [1, 2, 3];
let a = &mut x[0];
let b = &mut x[1];
println!("{} {}", a, b);
```
```text
<anon>:3:18: 3:22 error: cannot borrow immutable indexed content `x[..]` as mutable
<anon>:3 let a = &mut x[0];
^~~~
<anon>:4:18: 4:22 error: cannot borrow immutable indexed content `x[..]` as mutable
<anon>:4 let b = &mut x[1];
^~~~
error: aborting due to 2 previous errors
```
While it was plausible that borrowck could understand this simple case, it's
pretty clearly hopeless for borrowck to understand disjointness in general
container types like a tree, especially if distinct keys actually *do* map
to the same value.
In order to "teach" borrowck that what we're doing is ok, we need to drop down
to unsafe code. For instance, mutable slices expose a `split_at_mut` function that
consumes the slice and returns *two* mutable slices. One for everything to the
left of the index, and one for everything to the right. Intuitively we know this
is safe because the slices don't alias. However the implementation requires some
unsafety:
```rust,ignore
fn split_at_mut(&mut self, mid: usize) -> (&mut [T], &mut [T]) {
unsafe {
let self2: &mut [T] = mem::transmute_copy(&self);
(ops::IndexMut::index_mut(self, ops::RangeTo { end: mid } ),
ops::IndexMut::index_mut(self2, ops::RangeFrom { start: mid } ))
}
}
```
This is pretty plainly dangerous. We use transmute to duplicate the slice with an
*unbounded* lifetime, so that it can be treated as disjoint from the other until
we unify them when we return.
However more subtle is how iterators that yield mutable references work.
The iterator trait is defined as follows:
```rust
trait Iterator {
type Item;
fn next(&mut self) -> Option<Self::Item>;
}
```
Given this definition, Self::Item has *no* connection to `self`. This means
that we can call `next` several times in a row, and hold onto all the results
*concurrently*. This is perfectly fine for by-value iterators, which have exactly
these semantics. It's also actually fine for shared references, as they admit
arbitrarily many references to the same thing (although the
iterator needs to be a separate object from the thing being shared). But mutable
references make this a mess. At first glance, they might seem completely
incompatible with this API, as it would produce multiple mutable references to
the same object!
However it actually *does* work, exactly because iterators are one-shot objects.
Everything an IterMut yields will be yielded *at most* once, so we don't *actually*
ever yield multiple mutable references to the same piece of data.
In general all mutable iterators require *some* unsafe code *somewhere*, though.
Whether it's raw pointers, or safely composing on top of *another* IterMut.
For instance, VecDeque's IterMut:
```rust,ignore
struct IterMut<'a, T:'a> {
// The whole backing array. Some of these indices are initialized!
ring: &'a mut [T],
tail: usize,
head: usize,
}
impl<'a, T> Iterator for IterMut<'a, T> {
type Item = &'a mut T;
fn next(&mut self) -> Option<&'a mut T> {
if self.tail == self.head {
return None;
}
let tail = self.tail;
self.tail = wrap_index(self.tail.wrapping_add(1), self.ring.len());
unsafe {
// might as well do unchecked indexing since wrap_index has us
// in-bounds, and many of the "middle" indices are uninitialized
// anyway.
let elem = self.ring.get_unchecked_mut(tail);
// round-trip through a raw pointer to unbound the lifetime from
// ourselves
Some(&mut *(elem as *mut _))
}
}
}
```
A very subtle but interesting detail in this design is that it *relies on
privacy to be sound*. Borrowck works on some very simple rules. One of those rules
is that if we have a live &mut Foo and Foo contains an &mut Bar, then that &mut
Bar is *also* live. Since IterMut is always live when `next` can be called, if
`ring` were public then we could mutate `ring` while outstanding mutable borrows
to it exist!

@ -0,0 +1,87 @@
% PhantomData
When working with unsafe code, we can often end up in a situation where
types or lifetimes are logically associated with a struct, but not actually
part of a field. This most commonly occurs with lifetimes. For instance, the
`Iter` for `&'a [T]` is (approximately) defined as follows:
```rust,ignore
struct Iter<'a, T: 'a> {
ptr: *const T,
end: *const T,
}
```
However because `'a` is unused within the struct's body, it's *unbounded*.
Because of the troubles this has historically caused, unbounded lifetimes and
types are *illegal* in struct definitions. Therefore we must somehow refer
to these types in the body. Correctly doing this is necessary to have
correct variance and drop checking.
We do this using *PhantomData*, which is a special marker type. PhantomData
consumes no space, but simulates a field of the given type for the purpose of
static analysis. This was deemed to be less error-prone than explicitly telling
the type-system the kind of variance that you want, while also providing other
useful such as the information needed by drop check.
Iter logically contains a bunch of `&'a T`s, so this is exactly what we tell
the PhantomData to simulate:
```
use std::marker;
struct Iter<'a, T: 'a> {
ptr: *const T,
end: *const T,
_marker: marker::PhantomData<&'a T>,
}
```
and that's it. The lifetime will be bounded, and your iterator will be variant
over `'a` and `T`. Everything Just Works.
Another important example is Vec, which is (approximately) defined as follows:
```
struct Vec<T> {
data: *const T, // *const for variance!
len: usize,
cap: usize,
}
```
Unlike the previous example it *appears* that everything is exactly as we
want. Every generic argument to Vec shows up in the at least one field.
Good to go!
Nope.
The drop checker will generously determine that Vec<T> does not own any values
of type T. This will in turn make it conclude that it does *not* need to worry
about Vec dropping any T's in its destructor for determining drop check
soundness. This will in turn allow people to create unsoundness using
Vec's destructor.
In order to tell dropck that we *do* own values of type T, and therefore may
drop some T's when *we* drop, we must add an extra PhantomData saying exactly
that:
```
use std::marker;
struct Vec<T> {
data: *const T, // *const for covariance!
len: usize,
cap: usize,
_marker: marker::PhantomData<T>,
}
```
Raw pointers that own an allocation is such a pervasive pattern that the
standard library made a utility for itself called `Unique<T>` which:
* wraps a `*const T` for variance
* includes a `PhantomData<T>`,
* auto-derives Send/Sync as if T was contained
* marks the pointer as NonZero for the null-pointer optimization
Loading…
Cancel
Save