lifetiiiiimes

pull/10/head
Alexis Beingessner 9 years ago committed by Manish Goregaokar
parent 3d8ce59c68
commit cec46bf948

@ -61,7 +61,7 @@ only implemented automatically, and enables the following transformations:
* `T` => `Trait` where `T: Trait`
* `SubTrait` => `Trait` where `SubTrait: Trait` (TODO: is this now implied by the previous?)
* `Foo<..., T, ...>` => `Foo<..., U, ...>` where:
* T: Unsize<U>
* `T: Unsize<U>`
* `Foo` is a struct
* Only the last field has type `T`
* `T` is not part of the type of any other fields

@ -2,106 +2,360 @@
Ownership is the breakout feature of Rust. It allows Rust to be completely
memory-safe and efficient, while avoiding garbage collection. Before getting
into the ownership system in detail, we will consider a simple but *fundamental*
language-design problem.
into the ownership system in detail, we will consider the motivation of this
design.
TODO: Interior Mutability section
# The Tagged Union Problem
TODO: rewrite this to use Box instead?
The core of the lifetime and mutability system derives from a simple problem:
internal pointers to tagged unions. For instance, consider the following code:
# Living Without Garbage Collection
We will assume that you accept that garbage collection is not always an optimal
solution, and that it is desirable to manually manage memory to some extent.
If you do not accept this, might I interest you in a different language?
Regardless of your feelings on GC, it is pretty clearly a *massive* boon to
making code safe. You never have to worry about things going away *too soon*
(although whether you still *wanted* to be pointing at that thing is a different
issue...). This is a pervasive problem that C and C++ need to deal with.
Consider this simple mistake that all of us who have used a non-GC'd language
have made at one point:
```rust,ignore
fn as_str(data: &u32) -> &str {
// compute the string
let s = format!("{}", data);
// OH NO! We returned a reference to something that
// exists only in this function!
// Dangling pointer! Use after free! Alas!
// (this does not compile in Rust)
&s
}
```
This is exactly what Rust's ownership system was built to solve.
Rust knows the scope in which the `&s` lives, and as such can prevent it from
escaping. However this is a simple case that even a C compiler could plausibly
catch. Things get more complicated as code gets bigger and pointers get fed through
various functions. Eventually, a C compiler will fall down and won't be able to
perform sufficient escape analysis to prove your code unsound. It will consequently
be forced to accept your program on the assumption that it is correct.
This will never happen to Rust. It's up to the programmer to prove to the
compiler that everything is sound.
Of course, rust's story around ownership is much more complicated than just
verifying that references don't escape the scope of their referrent. That's
because ensuring pointers are always valid is much more complicated than this.
For instance in this code,
```rust,ignore
let mut data = vec![1, 2, 3];
// get an internal reference
let x = &data[0];
// OH NO! `push` causes the backing storage of `data` to be reallocated.
// Dangling pointer! User after free! Alas!
// (this does not compile in Rust)
data.push(4);
println!("{}", x);
```
naive scope analysis would be insufficient to prevent this bug, because `data`
does in fact live as long as we needed. However it was *changed* while we had
a reference into it. This is why Rust requires any references to freeze the
referrent and its owners.
# References
There are two kinds of reference:
* Shared reference: `&`
* Mutable reference: `&mut`
Which obey the following rules:
* A reference cannot outlive its referrent
* A mutable reference cannot be aliased
To define aliasing, we must define the notion of *paths* and *liveness*.
## Paths
If all Rust had were values, then every value would be uniquely owned
by a variable or composite structure. From this we naturally derive a *tree*
of ownership. The stack itself is the root of the tree, with every variable
as its direct children. Each variable's direct children would be their fields
(if any), and so on.
From this view, every value in Rust has a unique *path* in the tree of ownership.
References to a value can subsequently be interpretted as a path in this tree.
Of particular interest are *prefixes*: `x` is a prefix of `y` if `x` owns `y`
However much data doesn't reside on the stack, and we must also accomodate this.
Globals and thread-locals are simple enough to model as residing at the bottom
of the stack. However data on the heap poses a different problem.
If all Rust had on the heap was data uniquely by a pointer on the stack,
then we can just treat that pointer as a struct that owns the value on
the heap. Box, Vec, String, and HashMap, are examples of types which uniquely
own data on the heap.
Unfortunately, data on the heap is not *always* uniquely owned. Rc for instance
introduces a notion of *shared* ownership. Shared ownership means there is no
unique path. A value with no unique path limits what we can do with it. In general, only
shared references can be created to these values. However mechanisms which ensure
mutual exclusion may establish One True Owner temporarily, establishing a unique path
to that value (and therefore all its children).
The most common way to establish such a path is through *interior mutability*,
in contrast to the *inherited mutability* that everything in Rust normally uses.
Cell, RefCell, Mutex, and RWLock are all examples of interior mutability types. These
types provide exclusive access through runtime restrictions. However it is also
possible to establish unique ownership without interior mutability. For instance,
if an Rc has refcount 1, then it is safe to mutate or move its internals.
## Liveness
Roughly, a reference is *live* at some point in a program if it can be
dereferenced. Shared references are always live unless they are literally unreachable
(for instance, they reside in freed or leaked memory). Mutable references can be
reachable but *not* live through the process of *reborrowing*.
A mutable reference can be reborrowed to either a shared or mutable reference.
Further, the reborrow can produce exactly the same reference, or point to a
path it is a prefix of. For instance, a mutable reference can be reborrowed
to point to a field of its referrent:
```rust
enum Foo {
A(u32),
B(f64),
let x = &mut (1, 2);
{
// reborrow x to a subfield
let y = &mut x.0;
// y is now live, but x isn't
*y = 3;
}
// y goes out of scope, so x is live again
*x = (5, 7);
```
let mut x = B(2.0);
if let B(ref mut y) = x {
*x = A(7);
// OH NO! a u32 has been interpretted as an f64! Type-safety hole!
// (this does not actually compile)
println!("{}", y);
It is also possible to reborrow into *multiple* mutable references, as long as
they are to *disjoint*: no reference is a prefix of another. Rust
explicitly enables this to be done with disjoint struct fields, because
disjointness can be statically proven:
```
let x = &mut (1, 2);
{
// reborrow x to two disjoint subfields
let y = &mut x.0;
let z = &mut x.1;
// y and z are now live, but x isn't
*y = 3;
*z = 4;
}
// y and z go out of scope, so x is live again
*x = (5, 7);
```
The problem here is an intersection of 3 choices:
However it's often the case that Rust isn't sufficiently smart to prove that
multiple borrows are disjoint. *This does not mean it is fundamentally illegal
to make such a borrow*, just that Rust isn't as smart as you want.
* data in a tagged union is inline with the tag
* tagged unions are mutable
* being able to take a pointer into a tagged union
To simplify things, we can model variables as a fake type of reference: *owned*
references. Owned references have much the same semantics as mutable references:
they can be re-borrowed in a mutable or shared manner, which makes them no longer
live. Live owned references have the unique property that they can be moved
out of (though mutable references *can* be swapped out of). This is
only given to *live* owned references because moving its referrent would of
course invalidate all outstanding references prematurely.
Remove *any* of these 3 and the problem goes away. Traditionally, functional
languages have avoided this problem by removing the mutable
option. This means that they can in principle keep their data inline (ghc has
a pragma for this). A garbage collected imperative language like Java could alternatively
solve this problem by just keeping all variants elsewhere, so that changing the
variant of a tagged union just overwrites a pointer, and anyone with an outstanding
pointer to the inner data is unaffected thanks to The Magic Of Garbage Collection.
As a local lint against inappropriate mutation, only variables that are marked
as `mut` can be borrowed mutably.
Rust, by contrast, takes a subtler approach. Rust allows mutation,
allows pointers to inner data, and its enums have their data allocated inline.
However it prevents anything from being mutated while there are outstanding
pointers to it! And this is all done at compile time.
It is also interesting to note that Box behaves exactly like an owned
reference. It can be moved out of, and Rust understands it sufficiently to
reason about its paths like a normal variable.
Interestingly, Rust's `std::cell` module exposes two types that offer an alternative
approach to this problem:
* The `Cell` type allows mutation of aliased data, but
instead forbids internal pointers to that data. The only way to read or write
a Cell is to copy the bits in or out.
* The `RefCell` type allows mutation of aliased data *and* internal pointers, but
manages this through *runtime* checks. It is effectively a thread-unsafe
read-write lock.
For more details see Dan Grossman's *Existential Types for Imperative Languages*:
## Aliasing
* [paper][grossman-paper] (Advanced)
* [slides][grossman-slides] (Simple)
With liveness and paths defined, we can now properly define *aliasing*:
[grossman-paper]: http://homes.cs.washington.edu/~djg/papers/exists_imp.pdf
[grossman-slides]: https://homes.cs.washington.edu/~djg/slides/esop02_talk.pdf
**A mutable reference is aliased if there exists another live reference to it or
one of its prefixes.**
That's it. Super simple right? Except for the fact that it took us two pages
to define all of the terms in that defintion. You know: Super. Simple.
Actually it's a bit more complicated than that. In addition to references,
Rust has *raw pointers*: `*const T` and `*mut T`. Raw pointers have no inherent
ownership or aliasing semantics. As a result, Rust makes absolutely no effort
to track that they are used correctly, and they are wildly unsafe.
**It is an open question to what degree raw pointers have alias semantics.
However it is important for these definitions to be sound that the existence
of a raw pointer does not imply some kind of live path.**
# Lifetimes
Rust's static checks are managed by the *borrow checker* (borrowck), which tracks
mutability and outstanding loans. This analysis can in principle be done without
any help locally. However as soon as data starts crossing the function boundary,
we have some serious trouble. In principle, borrowck could be a massive
whole-program analysis engine to handle this problem, but this would be an
atrocious solution. It would be terribly slow, and errors would be horribly
non-local.
Rust enforces these rules through *lifetimes*. Lifetimes are effectively
just names for scopes on the stack, somewhere in the program. Each reference,
and anything that contains a reference, is tagged with a lifetime specifying
the scope it's valid for.
Instead, Rust tracks ownership through *lifetimes*. Every single reference and value
in Rust is tagged with a lifetime that indicates the scope it is valid for.
Rust has two kinds of reference:
Within a function body, Rust generally doesn't let you explicitly name the
lifetimes involved. This is because it's generally not really *necessary*
to talk about lifetimes in a local context; rust has all the information and
can work out everything.
* Shared reference: `&`
* Mutable reference: `&mut`
However once you cross the function boundary, you need to start talking about
lifetimes. Lifetimes are denoted with an apostrophe: `'a`, `'static`. To dip
our toes with lifetimes, we're going to pretend that we're actually allowed
to label scopes with lifetimes, and desugar the examples from the start of
this chapter.
The main rules are as follows:
Our examples made use of *aggressive* sugar around scopes and lifetimes,
because writing everything out explicitly is *extremely noisy*. All rust code
relies on aggressive inference and elision of "obvious" things.
* A shared reference can be aliased
* A mutable reference cannot be aliased
* A reference cannot outlive its referrent (`&'a T -> T: 'a`)
One particularly interesting piece of sugar is that each `let` statement implicitly
introduces a scope. For the most part, this doesn't really matter. However it
does matter for variables that refer to each other. As a simple example, let's
completely desugar this simple piece of Rust code:
However non-mutable variables have some special rules:
```rust
let x = 0;
let y = &x;
let z = &y;
```
* You cannot mutate or mutably borrow a non-mut variable,
becomes:
Only variables marked as mutable can be borrowed mutably, though this is little
more than a local lint against incorrect usage of a value.
```rust,ignore
// NOTE: `'a:` and `&'a x` is not valid syntax!
'a: {
let x: i32 = 0;
'b: {
let y: &'a i32 = &'a x;
'c: {
let z: &'b &'a i32 = &'b y;
}
}
}
```
Wow. That's... awful. Let's all take a moment to thank Rust for being a huge
pile of sugar with sugar on top.
Anyway, let's look at some of those examples from before:
```rust,ignore
fn as_str(data: &u32) -> &str {
let s = format!("{}", data);
&s
}
```
desugars to:
```rust,ignore
fn as_str<'a>(data: &'a u32) -> &'a str {
'b: {
let s = format!("{}", data);
return &'b s
}
}
```
This signature of `as_str` takes a reference to a u32 with *some* lifetime, and
promises that it can produce a reference to a str that can live *just as long*.
Already we can see why this signature might be trouble. That basically implies
that we're going to *find* a str somewhere in the scope that u32 originated in,
or somewhere *even* earlier. That's uh... a big ask.
We then proceed to compute the string `s`, and return a reference to it.
Unfortunately, since `s` was defined in the scope `'b`, the reference we're
returning can only live for that long. From the perspective of the compiler,
we've failed *twice* here. We've failed to fulfill the contract we were asked
to fulfill (`'b` is unrelated to `'a`); and we've also tried to make a reference
outlive its referrent by returning an `&'b`, where `'b` is in our function.
Shoot!
Of course, the right way to right this function is as follows:
```rust
fn to_string(data: &u32) -> String {
format!("{}", data)
}
```
We must produce an owned value inside the function to return it! The only way
we could have returned an `&'a str` would have been if it was in a field of the
`&'a u32`, which is obviously not the case.
(Actually we could have also just returned a string literal, though this limits
the behaviour of our function *just a bit*.)
How about the other example:
```rust,ignore
let mut data = vec![1, 2, 3];
let x = &data[0];
data.push(4);
println!("{}", x);
```
```rust,ignore
'a: {
let mut data: Vec<i32> = vec![1, 2, 3];
'b: {
let x: &'a i32 = Index::index(&'a data, 0);
'c: {
// Exactly what the desugar for Vec::push is is up to Rust.
// This particular desugar is a decent approximation for our
// purpose. In particular methods oft invoke a temporary borrow.
let temp: &'c mut Vec = &'c mut data;
// NOTE: Vec::push is not valid syntax
Vec::push(temp, 4);
}
println!("{}", x);
}
}
```
Here the problem is that we're trying to mutably borrow the `data` path, while
we have a reference into something it's a prefix of. Rust subsequently throws
up its hands in disgust and rejects our program. The correct way to write this
is to just re-order the code so that we make `x` *after* we push:
TODO: convince myself of this.
```rust
let mut data = vec![1, 2, 3];
data.push(4);
let x = &data[0];
println!("{}", x);
```
@ -213,7 +467,9 @@ these are unstable due to their awkward nature and questionable utility.
# Higher-Rank Lifetimes
# Higher-Rank Trait Bounds
// TODO: make aturon less mad
Generics in Rust generally allow types to be instantiated with arbitrary
associated lifetimes, but this fixes the lifetimes they work with once

Loading…
Cancel
Save