diff --git a/conversions.md b/conversions.md index 55b7b20..25d849d 100644 --- a/conversions.md +++ b/conversions.md @@ -61,7 +61,7 @@ only implemented automatically, and enables the following transformations: * `T` => `Trait` where `T: Trait` * `SubTrait` => `Trait` where `SubTrait: Trait` (TODO: is this now implied by the previous?) * `Foo<..., T, ...>` => `Foo<..., U, ...>` where: - * T: Unsize + * `T: Unsize` * `Foo` is a struct * Only the last field has type `T` * `T` is not part of the type of any other fields diff --git a/lifetimes.md b/lifetimes.md index cd6f5ad..976b4d5 100644 --- a/lifetimes.md +++ b/lifetimes.md @@ -2,106 +2,360 @@ Ownership is the breakout feature of Rust. It allows Rust to be completely memory-safe and efficient, while avoiding garbage collection. Before getting -into the ownership system in detail, we will consider a simple but *fundamental* -language-design problem. +into the ownership system in detail, we will consider the motivation of this +design. +TODO: Interior Mutability section -# The Tagged Union Problem -TODO: rewrite this to use Box instead? -The core of the lifetime and mutability system derives from a simple problem: -internal pointers to tagged unions. For instance, consider the following code: +# Living Without Garbage Collection + +We will assume that you accept that garbage collection is not always an optimal +solution, and that it is desirable to manually manage memory to some extent. +If you do not accept this, might I interest you in a different language? + +Regardless of your feelings on GC, it is pretty clearly a *massive* boon to +making code safe. You never have to worry about things going away *too soon* +(although whether you still *wanted* to be pointing at that thing is a different +issue...). This is a pervasive problem that C and C++ need to deal with. +Consider this simple mistake that all of us who have used a non-GC'd language +have made at one point: + +```rust,ignore +fn as_str(data: &u32) -> &str { + // compute the string + let s = format!("{}", data); + + // OH NO! We returned a reference to something that + // exists only in this function! + // Dangling pointer! Use after free! Alas! + // (this does not compile in Rust) + &s +} +``` + +This is exactly what Rust's ownership system was built to solve. +Rust knows the scope in which the `&s` lives, and as such can prevent it from +escaping. However this is a simple case that even a C compiler could plausibly +catch. Things get more complicated as code gets bigger and pointers get fed through +various functions. Eventually, a C compiler will fall down and won't be able to +perform sufficient escape analysis to prove your code unsound. It will consequently +be forced to accept your program on the assumption that it is correct. + +This will never happen to Rust. It's up to the programmer to prove to the +compiler that everything is sound. + +Of course, rust's story around ownership is much more complicated than just +verifying that references don't escape the scope of their referrent. That's +because ensuring pointers are always valid is much more complicated than this. +For instance in this code, + +```rust,ignore +let mut data = vec![1, 2, 3]; +// get an internal reference +let x = &data[0]; + +// OH NO! `push` causes the backing storage of `data` to be reallocated. +// Dangling pointer! User after free! Alas! +// (this does not compile in Rust) +data.push(4); + +println!("{}", x); +``` + +naive scope analysis would be insufficient to prevent this bug, because `data` +does in fact live as long as we needed. However it was *changed* while we had +a reference into it. This is why Rust requires any references to freeze the +referrent and its owners. + + + +# References + +There are two kinds of reference: + +* Shared reference: `&` +* Mutable reference: `&mut` + +Which obey the following rules: + +* A reference cannot outlive its referrent +* A mutable reference cannot be aliased + +To define aliasing, we must define the notion of *paths* and *liveness*. + + + + +## Paths + +If all Rust had were values, then every value would be uniquely owned +by a variable or composite structure. From this we naturally derive a *tree* +of ownership. The stack itself is the root of the tree, with every variable +as its direct children. Each variable's direct children would be their fields +(if any), and so on. + +From this view, every value in Rust has a unique *path* in the tree of ownership. +References to a value can subsequently be interpretted as a path in this tree. +Of particular interest are *prefixes*: `x` is a prefix of `y` if `x` owns `y` + +However much data doesn't reside on the stack, and we must also accomodate this. +Globals and thread-locals are simple enough to model as residing at the bottom +of the stack. However data on the heap poses a different problem. + +If all Rust had on the heap was data uniquely by a pointer on the stack, +then we can just treat that pointer as a struct that owns the value on +the heap. Box, Vec, String, and HashMap, are examples of types which uniquely +own data on the heap. + +Unfortunately, data on the heap is not *always* uniquely owned. Rc for instance +introduces a notion of *shared* ownership. Shared ownership means there is no +unique path. A value with no unique path limits what we can do with it. In general, only +shared references can be created to these values. However mechanisms which ensure +mutual exclusion may establish One True Owner temporarily, establishing a unique path +to that value (and therefore all its children). + +The most common way to establish such a path is through *interior mutability*, +in contrast to the *inherited mutability* that everything in Rust normally uses. +Cell, RefCell, Mutex, and RWLock are all examples of interior mutability types. These +types provide exclusive access through runtime restrictions. However it is also +possible to establish unique ownership without interior mutability. For instance, +if an Rc has refcount 1, then it is safe to mutate or move its internals. + + + + +## Liveness + +Roughly, a reference is *live* at some point in a program if it can be +dereferenced. Shared references are always live unless they are literally unreachable +(for instance, they reside in freed or leaked memory). Mutable references can be +reachable but *not* live through the process of *reborrowing*. + +A mutable reference can be reborrowed to either a shared or mutable reference. +Further, the reborrow can produce exactly the same reference, or point to a +path it is a prefix of. For instance, a mutable reference can be reborrowed +to point to a field of its referrent: ```rust -enum Foo { - A(u32), - B(f64), +let x = &mut (1, 2); +{ + // reborrow x to a subfield + let y = &mut x.0; + // y is now live, but x isn't + *y = 3; } +// y goes out of scope, so x is live again +*x = (5, 7); +``` -let mut x = B(2.0); -if let B(ref mut y) = x { - *x = A(7); - // OH NO! a u32 has been interpretted as an f64! Type-safety hole! - // (this does not actually compile) - println!("{}", y); +It is also possible to reborrow into *multiple* mutable references, as long as +they are to *disjoint*: no reference is a prefix of another. Rust +explicitly enables this to be done with disjoint struct fields, because +disjointness can be statically proven: +``` +let x = &mut (1, 2); +{ + // reborrow x to two disjoint subfields + let y = &mut x.0; + let z = &mut x.1; + // y and z are now live, but x isn't + *y = 3; + *z = 4; } +// y and z go out of scope, so x is live again +*x = (5, 7); ``` -The problem here is an intersection of 3 choices: +However it's often the case that Rust isn't sufficiently smart to prove that +multiple borrows are disjoint. *This does not mean it is fundamentally illegal +to make such a borrow*, just that Rust isn't as smart as you want. -* data in a tagged union is inline with the tag -* tagged unions are mutable -* being able to take a pointer into a tagged union +To simplify things, we can model variables as a fake type of reference: *owned* +references. Owned references have much the same semantics as mutable references: +they can be re-borrowed in a mutable or shared manner, which makes them no longer +live. Live owned references have the unique property that they can be moved +out of (though mutable references *can* be swapped out of). This is +only given to *live* owned references because moving its referrent would of +course invalidate all outstanding references prematurely. -Remove *any* of these 3 and the problem goes away. Traditionally, functional -languages have avoided this problem by removing the mutable -option. This means that they can in principle keep their data inline (ghc has -a pragma for this). A garbage collected imperative language like Java could alternatively -solve this problem by just keeping all variants elsewhere, so that changing the -variant of a tagged union just overwrites a pointer, and anyone with an outstanding -pointer to the inner data is unaffected thanks to The Magic Of Garbage Collection. +As a local lint against inappropriate mutation, only variables that are marked +as `mut` can be borrowed mutably. -Rust, by contrast, takes a subtler approach. Rust allows mutation, -allows pointers to inner data, and its enums have their data allocated inline. -However it prevents anything from being mutated while there are outstanding -pointers to it! And this is all done at compile time. +It is also interesting to note that Box behaves exactly like an owned +reference. It can be moved out of, and Rust understands it sufficiently to +reason about its paths like a normal variable. -Interestingly, Rust's `std::cell` module exposes two types that offer an alternative -approach to this problem: -* The `Cell` type allows mutation of aliased data, but -instead forbids internal pointers to that data. The only way to read or write -a Cell is to copy the bits in or out. -* The `RefCell` type allows mutation of aliased data *and* internal pointers, but -manages this through *runtime* checks. It is effectively a thread-unsafe -read-write lock. -For more details see Dan Grossman's *Existential Types for Imperative Languages*: +## Aliasing -* [paper][grossman-paper] (Advanced) -* [slides][grossman-slides] (Simple) +With liveness and paths defined, we can now properly define *aliasing*: -[grossman-paper]: http://homes.cs.washington.edu/~djg/papers/exists_imp.pdf -[grossman-slides]: https://homes.cs.washington.edu/~djg/slides/esop02_talk.pdf +**A mutable reference is aliased if there exists another live reference to it or +one of its prefixes.** + +That's it. Super simple right? Except for the fact that it took us two pages +to define all of the terms in that defintion. You know: Super. Simple. + +Actually it's a bit more complicated than that. In addition to references, +Rust has *raw pointers*: `*const T` and `*mut T`. Raw pointers have no inherent +ownership or aliasing semantics. As a result, Rust makes absolutely no effort +to track that they are used correctly, and they are wildly unsafe. + +**It is an open question to what degree raw pointers have alias semantics. +However it is important for these definitions to be sound that the existence +of a raw pointer does not imply some kind of live path.** # Lifetimes -Rust's static checks are managed by the *borrow checker* (borrowck), which tracks -mutability and outstanding loans. This analysis can in principle be done without -any help locally. However as soon as data starts crossing the function boundary, -we have some serious trouble. In principle, borrowck could be a massive -whole-program analysis engine to handle this problem, but this would be an -atrocious solution. It would be terribly slow, and errors would be horribly -non-local. +Rust enforces these rules through *lifetimes*. Lifetimes are effectively +just names for scopes on the stack, somewhere in the program. Each reference, +and anything that contains a reference, is tagged with a lifetime specifying +the scope it's valid for. -Instead, Rust tracks ownership through *lifetimes*. Every single reference and value -in Rust is tagged with a lifetime that indicates the scope it is valid for. -Rust has two kinds of reference: +Within a function body, Rust generally doesn't let you explicitly name the +lifetimes involved. This is because it's generally not really *necessary* +to talk about lifetimes in a local context; rust has all the information and +can work out everything. -* Shared reference: `&` -* Mutable reference: `&mut` +However once you cross the function boundary, you need to start talking about +lifetimes. Lifetimes are denoted with an apostrophe: `'a`, `'static`. To dip +our toes with lifetimes, we're going to pretend that we're actually allowed +to label scopes with lifetimes, and desugar the examples from the start of +this chapter. -The main rules are as follows: +Our examples made use of *aggressive* sugar around scopes and lifetimes, +because writing everything out explicitly is *extremely noisy*. All rust code +relies on aggressive inference and elision of "obvious" things. -* A shared reference can be aliased -* A mutable reference cannot be aliased -* A reference cannot outlive its referrent (`&'a T -> T: 'a`) +One particularly interesting piece of sugar is that each `let` statement implicitly +introduces a scope. For the most part, this doesn't really matter. However it +does matter for variables that refer to each other. As a simple example, let's +completely desugar this simple piece of Rust code: -However non-mutable variables have some special rules: +```rust +let x = 0; +let y = &x; +let z = &y; +``` -* You cannot mutate or mutably borrow a non-mut variable, +becomes: -Only variables marked as mutable can be borrowed mutably, though this is little -more than a local lint against incorrect usage of a value. +```rust,ignore +// NOTE: `'a:` and `&'a x` is not valid syntax! +'a: { + let x: i32 = 0; + 'b: { + let y: &'a i32 = &'a x; + 'c: { + let z: &'b &'a i32 = &'b y; + } + } +} +``` +Wow. That's... awful. Let's all take a moment to thank Rust for being a huge +pile of sugar with sugar on top. +Anyway, let's look at some of those examples from before: + +```rust,ignore +fn as_str(data: &u32) -> &str { + let s = format!("{}", data); + &s +} +``` + +desugars to: + +```rust,ignore +fn as_str<'a>(data: &'a u32) -> &'a str { + 'b: { + let s = format!("{}", data); + return &'b s + } +} +``` + +This signature of `as_str` takes a reference to a u32 with *some* lifetime, and +promises that it can produce a reference to a str that can live *just as long*. +Already we can see why this signature might be trouble. That basically implies +that we're going to *find* a str somewhere in the scope that u32 originated in, +or somewhere *even* earlier. That's uh... a big ask. + +We then proceed to compute the string `s`, and return a reference to it. +Unfortunately, since `s` was defined in the scope `'b`, the reference we're +returning can only live for that long. From the perspective of the compiler, +we've failed *twice* here. We've failed to fulfill the contract we were asked +to fulfill (`'b` is unrelated to `'a`); and we've also tried to make a reference +outlive its referrent by returning an `&'b`, where `'b` is in our function. + +Shoot! + +Of course, the right way to right this function is as follows: + +```rust +fn to_string(data: &u32) -> String { + format!("{}", data) +} +``` + +We must produce an owned value inside the function to return it! The only way +we could have returned an `&'a str` would have been if it was in a field of the +`&'a u32`, which is obviously not the case. + +(Actually we could have also just returned a string literal, though this limits +the behaviour of our function *just a bit*.) + +How about the other example: + +```rust,ignore +let mut data = vec![1, 2, 3]; +let x = &data[0]; +data.push(4); +println!("{}", x); +``` + +```rust,ignore +'a: { + let mut data: Vec = vec![1, 2, 3]; + 'b: { + let x: &'a i32 = Index::index(&'a data, 0); + 'c: { + // Exactly what the desugar for Vec::push is is up to Rust. + // This particular desugar is a decent approximation for our + // purpose. In particular methods oft invoke a temporary borrow. + let temp: &'c mut Vec = &'c mut data; + // NOTE: Vec::push is not valid syntax + Vec::push(temp, 4); + } + println!("{}", x); + } +} +``` + +Here the problem is that we're trying to mutably borrow the `data` path, while +we have a reference into something it's a prefix of. Rust subsequently throws +up its hands in disgust and rejects our program. The correct way to write this +is to just re-order the code so that we make `x` *after* we push: + +TODO: convince myself of this. + +```rust +let mut data = vec![1, 2, 3]; +data.push(4); + +let x = &data[0]; +println!("{}", x); +``` @@ -213,7 +467,9 @@ these are unstable due to their awkward nature and questionable utility. -# Higher-Rank Lifetimes +# Higher-Rank Trait Bounds + +// TODO: make aturon less mad Generics in Rust generally allow types to be instantiated with arbitrary associated lifetimes, but this fixes the lifetimes they work with once