mirror of https://github.com/rust-lang/nomicon
parent
5473508096
commit
9d36fdbd0d
@ -0,0 +1,186 @@
|
||||
% Concurrency and Paralellism
|
||||
|
||||
```Not sure if I want this
|
||||
Safe Rust features *a ton* of tooling to make concurrency and parallelism totally
|
||||
safe, easy, and fearless. This is a case where we'll really just
|
||||
[defer to TRPL][trpl-conc] for the basics.
|
||||
|
||||
TL;DR: The `Send` and `Sync` traits in conjunction with Rust's ownership model and
|
||||
normal generic bounds make using concurrent APIs really easy and painless for
|
||||
a user of Safe Rust.
|
||||
```
|
||||
|
||||
## Data Races and Race Conditions
|
||||
|
||||
Safe Rust guarantees an absence of data races, which are defined as:
|
||||
|
||||
* two or more threads concurrently accessing a location of memory
|
||||
* one of them is a write
|
||||
* one of them is unsynchronized
|
||||
|
||||
A data race has Undefined Behaviour, and is therefore impossible to perform
|
||||
in Safe Rust. Data races are *mostly* prevented through rust's ownership system:
|
||||
it's impossible to alias a mutable reference, so it's impossible to perform a
|
||||
data race. Interior mutability makes this more complicated, which is largely why
|
||||
we have the Send and Sync traits (see below).
|
||||
|
||||
However Rust *does not* prevent general race conditions. This is
|
||||
pretty fundamentally impossible, and probably honestly undesirable. Your hardware
|
||||
is racy, your OS is racy, the other programs on your computer are racy, and the
|
||||
world this all runs in is racy. Any system that could genuinely claim to prevent
|
||||
*all* race conditions would be pretty awful to use, if not just incorrect.
|
||||
|
||||
So it's perfectly "fine" for a Safe Rust program to get deadlocked or do
|
||||
something incredibly stupid with incorrect synchronization. Obviously such a
|
||||
program isn't very good, but Rust can only hold your hand so far. Still, a
|
||||
race condition can't violate memory safety in a Rust program on
|
||||
its own. Only in conjunction with some other unsafe code can a race condition
|
||||
actually violate memory safety. For instance:
|
||||
|
||||
```rust
|
||||
use std::thread;
|
||||
use std::sync::atomic::{AtomicUsize, Ordering};
|
||||
use std::sync::Arc;
|
||||
|
||||
let data = vec![1, 2, 3, 4];
|
||||
// Arc so that the memory the AtomicUsize is stored in still exists for
|
||||
// the other thread to increment, even if we completely finish executing
|
||||
// before it. Rust won't compile the program without it, because of the
|
||||
// lifetime requirements of thread::spawn!
|
||||
let idx = Arc::new(AtomicUsize::new(0));
|
||||
let other_idx = idx.clone();
|
||||
|
||||
// `move` captures other_idx by-value, moving it into this thread
|
||||
thread::spawn(move || {
|
||||
// It's ok to mutate idx because this value
|
||||
// is an atomic, so it can't cause a Data Race.
|
||||
other_idx.fetch_add(10, Ordering::SeqCst);
|
||||
});
|
||||
|
||||
// Index with the value loaded from the atomic. This is safe because we
|
||||
// read the atomic memory only once, and then pass a *copy* of that value
|
||||
// to the Vec's indexing implementation. This indexing will be correctly
|
||||
// bounds checked, and there's no chance of the value getting changed
|
||||
// in the middle. However our program may panic if the thread we spawned
|
||||
// managed to increment before this ran. A race condition because correct
|
||||
// program execution (panicing is rarely correct) depends on order of
|
||||
// thread execution.
|
||||
println!("{}", data[idx.load(Ordering::SeqCst)]);
|
||||
|
||||
if idx.load(Ordering::SeqCst) < data.len() {
|
||||
unsafe {
|
||||
// Incorrectly loading the idx *after* we did the bounds check.
|
||||
// It could have changed. This is a race condition, *and dangerous*
|
||||
// because we decided to do `get_unchecked`, which is `unsafe`.
|
||||
println!("{}", data.get_unchecked(idx.load(Ordering::SeqCst)));
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Send and Sync
|
||||
|
||||
Not everything obeys inherited mutability, though. Some types allow you to multiply
|
||||
alias a location in memory while mutating it. Unless these types use synchronization
|
||||
to manage this access, they are absolutely not thread safe. Rust captures this with
|
||||
through the `Send` and `Sync` traits.
|
||||
|
||||
* A type is Send if it is safe to send it to another thread.
|
||||
* A type is Sync if it is safe to share between threads (`&T` is Send).
|
||||
|
||||
Send and Sync are *very* fundamental to Rust's concurrency story. As such, a
|
||||
substantial amount of special tooling exists to make them work right. First and
|
||||
foremost, they're *unsafe traits*. This means that they are unsafe *to implement*,
|
||||
and other unsafe code can *trust* that they are correctly implemented. Since
|
||||
they're *marker traits* (they have no associated items like methods), correctly
|
||||
implemented simply means that they have the intrinsic properties an implementor
|
||||
should have. Incorrectly implementing Send or Sync can cause Undefined Behaviour.
|
||||
|
||||
Send and Sync are also what Rust calls *opt-in builtin traits*.
|
||||
This means that, unlike every other trait, they are *automatically* derived:
|
||||
if a type is composed entirely of Send or Sync types, then it is Send or Sync.
|
||||
Almost all primitives are Send and Sync, and as a consequence pretty much
|
||||
all types you'll ever interact with are Send and Sync.
|
||||
|
||||
Major exceptions include:
|
||||
* raw pointers are neither Send nor Sync (because they have no safety guards)
|
||||
* `UnsafeCell` isn't Sync (and therefore `Cell` and `RefCell` aren't)
|
||||
* `Rc` isn't Send or Sync (because the refcount is shared and unsynchronized)
|
||||
|
||||
`Rc` and `UnsafeCell` are very fundamentally not thread-safe: they enable
|
||||
unsynchronized shared mutable state. However raw pointers are, strictly speaking,
|
||||
marked as thread-unsafe as more of a *lint*. Doing anything useful
|
||||
with a raw pointer requires dereferencing it, which is already unsafe. In that
|
||||
sense, one could argue that it would be "fine" for them to be marked as thread safe.
|
||||
|
||||
However it's important that they aren't thread safe to prevent types that
|
||||
*contain them* from being automatically marked as thread safe. These types have
|
||||
non-trivial untracked ownership, and it's unlikely that their author was
|
||||
necessarily thinking hard about thread safety. In the case of Rc, we have a nice
|
||||
example of a type that contains a `*mut` that is *definitely* not thread safe.
|
||||
|
||||
Types that aren't automatically derived can *opt-in* to Send and Sync by simply
|
||||
implementing them:
|
||||
|
||||
```rust
|
||||
struct MyBox(*mut u8);
|
||||
|
||||
unsafe impl Send for MyBox {}
|
||||
unsafe impl Sync for MyBox {}
|
||||
```
|
||||
|
||||
In the *incredibly rare* case that a type is *inappropriately* automatically
|
||||
derived to be Send or Sync, then one can also *unimplement* Send and Sync:
|
||||
|
||||
```rust
|
||||
struct SpecialThreadToken(u8);
|
||||
|
||||
impl !Send for SpecialThreadToken {}
|
||||
impl !Sync for SpecialThreadToken {}
|
||||
```
|
||||
|
||||
Note that *in and of itself* it is impossible to incorrectly derive Send and Sync.
|
||||
Only types that are ascribed special meaning by other unsafe code can possible cause
|
||||
trouble by being incorrectly Send or Sync.
|
||||
|
||||
Most uses of raw pointers should be encapsulated behind a sufficient abstraction
|
||||
that Send and Sync can be derived. For instance all of Rust's standard
|
||||
collections are Send and Sync (when they contain Send and Sync types)
|
||||
in spite of their pervasive use raw pointers to
|
||||
manage allocations and complex ownership. Similarly, most iterators into these
|
||||
collections are Send and Sync because they largely behave like an `&` or `&mut`
|
||||
into the collection.
|
||||
|
||||
TODO: better explain what can or can't be Send or Sync. Sufficient to appeal
|
||||
only to data races?
|
||||
|
||||
## Atomics
|
||||
|
||||
Rust pretty blatantly just inherits LLVM's model for atomics, which in turn is
|
||||
largely based off of the C11 model for atomics. This is not due these models
|
||||
being particularly excellent or easy to understand. Indeed, these models are
|
||||
quite complex and are known to have several flaws. Rather, it is a pragmatic
|
||||
concession to the fact that *everyone* is pretty bad at modeling atomics. At very
|
||||
least, we can benefit from existing tooling and research around C's model.
|
||||
|
||||
Trying to fully explain these models is fairly hopeless, so we're just going to
|
||||
drop that problem in LLVM's lap.
|
||||
|
||||
## Actually Doing Things Concurrently
|
||||
|
||||
Rust as a language doesn't *really* have an opinion on how to do concurrency or
|
||||
parallelism. The standard library exposes OS threads and blocking sys-calls
|
||||
because *everyone* has those and they're uniform enough that you can provide
|
||||
an abstraction over them in a relatively uncontroversial way. Message passing,
|
||||
green threads, and async APIs are all diverse enough that any abstraction over
|
||||
them tends to involve trade-offs that we weren't willing to commit to for 1.0.
|
||||
|
||||
However Rust's current design is setup so that you can set up your own
|
||||
concurrent paradigm or library as you see fit. Just require the right
|
||||
lifetimes and Send and Sync where appropriate and everything should Just Work
|
||||
with everyone else's stuff.
|
||||
|
||||
|
||||
|
||||
|
||||
[llvm-conc]: http://llvm.org/docs/Atomics.html
|
||||
[trpl-conc]: https://doc.rust-lang.org/book/concurrency.html
|
@ -0,0 +1,151 @@
|
||||
% The Unsafe Rust Programming Language
|
||||
|
||||
This document seeks to complement [The Rust Programming Language][] (TRPL).
|
||||
Where TRPL introduces the language and teaches the basics, TURPL dives deep into
|
||||
the specification of the language, and all the nasty bits necessary to write
|
||||
Unsafe Rust. TURPL does not assume you have read TRPL, but does assume you know
|
||||
the basics of the language and systems programming. We will not explain the
|
||||
stack or heap, we will not explain the syntax.
|
||||
|
||||
## A Tale Of Two Languages
|
||||
|
||||
Rust can be thought of as two different languages: Safe Rust, and Unsafe Rust.
|
||||
Any time someone opines the guarantees of Rust, they are almost surely talking about
|
||||
Safe Rust. However Safe Rust is not sufficient to write every program. For that,
|
||||
we need the Unsafe Rust superset.
|
||||
|
||||
Most fundamentally, writing bindings to other languages
|
||||
(such as the C exposed by your operating system) is never going to be safe. Rust
|
||||
can't control what other languages do to program execution! However Unsafe Rust is
|
||||
also necessary to construct fundamental abstractions where the type system is not
|
||||
sufficient to automatically prove what you're doing is sound.
|
||||
|
||||
Indeed, the Rust standard library is implemented in Rust, and it makes substantial
|
||||
use of Unsafe Rust for implementing IO, memory allocation, collections,
|
||||
synchronization, and other low-level computational primitives.
|
||||
|
||||
Upon hearing this, many wonder why they would not simply just use C or C++ in place of
|
||||
Rust (or just use a "real" safe language). If we're going to do unsafe things, why not
|
||||
lean on these much more established languages?
|
||||
|
||||
The most important difference between C++ and Rust is a matter of defaults:
|
||||
Rust is 100% safe by default. Even when you *opt out* of safety in Rust, it is a modular
|
||||
action. In deciding to work with unchecked uninitialized memory, this does not
|
||||
suddenly make dangling or null pointers a problem. When using unchecked indexing on `x`,
|
||||
one does not have to suddenly worry about indexing out of bounds on `y`.
|
||||
|
||||
C and C++, by contrast, have pervasive unsafety baked into the language. Even the
|
||||
modern best practices like `unique_ptr` have various safety pitfalls.
|
||||
|
||||
It should also be noted that writing Unsafe Rust should be regarded as an exceptional
|
||||
action. Unsafe Rust is often the domain of *fundamental libraries*. Anything that needs
|
||||
to make FFI bindings or define core abstractions. These fundamental libraries then expose
|
||||
a *safe* interface for intermediate libraries and applications to build upon. And these
|
||||
safe interfaces make an important promise: if your application segfaults, it's not your
|
||||
fault. *They* have a bug.
|
||||
|
||||
And really, how is that different from *any* safe language? Python, Ruby, and Java libraries
|
||||
can internally do all sorts of nasty things. The languages themselves are no
|
||||
different. Safe languages regularly have bugs that cause critical vulnerabilities.
|
||||
The fact that Rust is written with a healthy spoonful of Unsafe Rust is no different.
|
||||
However it *does* mean that Rust doesn't need to fall back to the pervasive unsafety of
|
||||
C to do the nasty things that need to get done.
|
||||
|
||||
## What does `unsafe` mean?
|
||||
|
||||
Rust tries to model memory safety through the `unsafe` keyword. Interestingly,
|
||||
the meaning of `unsafe` largely revolves around what
|
||||
its *absence* means. If the `unsafe` keyword is absent from a program, it should
|
||||
not be possible to violate memory safety under *any* conditions. The presence
|
||||
of `unsafe` means that there are conditions under which this code *could*
|
||||
violate memory safety.
|
||||
|
||||
To be more concrete, Rust cares about preventing the following things:
|
||||
|
||||
* Dereferencing null/dangling pointers
|
||||
* Reading uninitialized memory
|
||||
* Breaking the pointer aliasing rules (TBD) (llvm rules + noalias on &mut and & w/o UnsafeCell)
|
||||
* Invoking Undefined Behaviour (in e.g. compiler intrinsics)
|
||||
* Producing invalid primitive values:
|
||||
* dangling/null references
|
||||
* a `bool` that isn't 0 or 1
|
||||
* an undefined `enum` discriminant
|
||||
* a `char` larger than char::MAX
|
||||
* A non-utf8 `str`
|
||||
* Unwinding into an FFI function
|
||||
* Causing a data race
|
||||
|
||||
However libraries are free to declare arbitrary requirements if they could transitively
|
||||
cause memory safety issues. However Rust is otherwise quite permisive with respect to
|
||||
other dubious operations. Rust considers it "safe" to:
|
||||
|
||||
* Deadlock
|
||||
* Leak memory
|
||||
* Fail to call destructors
|
||||
* Access private fields
|
||||
* Overflow integers
|
||||
* Delete the production database
|
||||
|
||||
However any program that does such a thing is *probably* incorrect. Rust just isn't
|
||||
interested in modeling these problems, as they are much harder to prevent in general,
|
||||
and it's basically impossible to prevent incorrect programs from getting written.
|
||||
|
||||
Their are several places `unsafe` can appear in Rust today, which can largely be
|
||||
grouped into two categories:
|
||||
|
||||
* There are unchecked contracts here. To declare you understand this, I require
|
||||
you to write `unsafe` elsewhere:
|
||||
* On functions, `unsafe` is declaring the function to be unsafe to call. Users
|
||||
of the function must check the documentation to determine what this means,
|
||||
and then have to write `unsafe` somewhere to identify that they're aware of
|
||||
the danger.
|
||||
* On trait declarations, `unsafe` is declaring that *implementing* the trait
|
||||
is an unsafe operation, as it has contracts that other unsafe code is free to
|
||||
trust blindly.
|
||||
|
||||
* I am declaring that I have, to the best of my knowledge, adhered to the
|
||||
unchecked contracts:
|
||||
* On trait implementations, `unsafe` is declaring that the contract of the
|
||||
`unsafe` trait has been upheld.
|
||||
* On blocks, `unsafe` is declaring any unsafety from an unsafe
|
||||
operation to be handled, and therefore the parent function is safe.
|
||||
|
||||
There is also `#[unsafe_no_drop_flag]`, which is a special case that exists for
|
||||
historical reasons and is in the process of being phased out. See the section on
|
||||
destructors for details.
|
||||
|
||||
Some examples of unsafe functions:
|
||||
|
||||
* `slice::get_unchecked` will perform unchecked indexing, allowing memory
|
||||
safety to be freely violated.
|
||||
* `ptr::offset` in an intrinsic that invokes Undefined Behaviour if it is
|
||||
not "in bounds" as defined by LLVM (see the lifetimes section for details).
|
||||
* `mem::transmute` reinterprets some value as having the given type,
|
||||
bypassing type safety in arbitrary ways. (see the conversions section for details)
|
||||
* All FFI functions are `unsafe` because they can do arbitrary things.
|
||||
C being an obvious culprit, but generally any language can do something
|
||||
that Rust isn't happy about. (see the FFI section for details)
|
||||
|
||||
As of Rust 1.0 there are exactly two unsafe traits:
|
||||
|
||||
* `Send` is a marker trait (it has no actual API) that promises implementors
|
||||
are safe to send to another thread.
|
||||
* `Sync` is a marker trait that promises that threads can safely share
|
||||
implementors through a shared reference.
|
||||
|
||||
All other traits that declare any kind of contract *really* can't be trusted
|
||||
to adhere to their contract when memory-safety is at stake. For instance Rust has
|
||||
`PartialOrd` and `Ord` to differentiate between types which can "just" be
|
||||
compared and those that implement a total ordering. However you can't actually
|
||||
trust an implementor of `Ord` to actually provide a total ordering if failing to
|
||||
do so causes you to e.g. index out of bounds. But if it just makes your program
|
||||
do a stupid thing, then it's "fine" to rely on `Ord`.
|
||||
|
||||
The reason this is the case is that `Ord` is safe to implement, and it should be
|
||||
impossible for bad *safe* code to violate memory safety. Rust has traditionally
|
||||
avoided making traits unsafe because it makes `unsafe` pervasive in the language,
|
||||
which is not desirable. The only reason `Send` and `Sync` are unsafe is because
|
||||
thread safety is a sort of fundamental thing that a program can't really guard
|
||||
against locally (even by-value message passing still requires a notion Send).
|
||||
|
||||
|
@ -1,13 +1,467 @@
|
||||
% Advanced Lifetimes
|
||||
% Ownership
|
||||
|
||||
Lifetimes are the breakout feature of Rust.
|
||||
Ownership is the breakout feature of Rust. It allows Rust to be completely
|
||||
memory-safe and efficient, while avoiding garbage collection. Before getting
|
||||
into the ownership system in detail, we will consider a simple but *fundamental*
|
||||
language-design problem.
|
||||
|
||||
# Safe Rust
|
||||
|
||||
* no aliasing of &mut
|
||||
|
||||
# Unsafe Rust
|
||||
## The Tagged Union Problem
|
||||
|
||||
The core of the lifetime and mutability system derives from a simple problem:
|
||||
internal pointers to tagged unions. For instance, consider the following code:
|
||||
|
||||
```rust
|
||||
enum Foo {
|
||||
A(u32),
|
||||
B(f64),
|
||||
}
|
||||
|
||||
let mut x = B(2.0);
|
||||
if let B(ref mut y) = x {
|
||||
*x = A(7);
|
||||
// OH NO! a u32 has been interpretted as an f64! Type-safety hole!
|
||||
// (this does not actually compile)
|
||||
println!("{}", y);
|
||||
|
||||
}
|
||||
```
|
||||
|
||||
The problem here is an intersection of 3 choices:
|
||||
|
||||
* data in a tagged union is inline with the tag
|
||||
* tagged unions are mutable
|
||||
* being able to take a pointer into a tagged union
|
||||
|
||||
Remove *any* of these 3 and the problem goes away. Traditionally, functional
|
||||
languages have avoided this problem by removing the mutable
|
||||
option. This means that they can in principle keep their data inline (ghc has
|
||||
a pragma for this). A garbage collected imperative language like Java could alternatively
|
||||
solve this problem by just keeping all variants elsewhere, so that changing the
|
||||
variant of a tagged union just overwrites a pointer, and anyone with an outstanding
|
||||
pointer to the inner data is unaffected thanks to The Magic Of Garbage Collection.
|
||||
|
||||
Rust, by contrast, takes a subtler approach. Rust allows mutation,
|
||||
allows pointers to inner data, and its enums have their data allocated inline.
|
||||
However it prevents anything from being mutated while there are outstanding
|
||||
pointers to it! And this is all done at compile time.
|
||||
|
||||
Interestingly, Rust's `std::cell` module exposes two types that offer an alternative
|
||||
approach to this problem:
|
||||
|
||||
* The `Cell` type allows mutation of aliased data, but
|
||||
instead forbids internal pointers to that data. The only way to read or write
|
||||
a Cell is to copy the bits in or out.
|
||||
|
||||
* The `RefCell` type allows mutation of aliased data *and* internal pointers, but
|
||||
manages this through *runtime* checks. It is effectively a thread-unsafe
|
||||
read-write lock.
|
||||
|
||||
|
||||
|
||||
## Lifetimes
|
||||
|
||||
Rust's static checks are managed by the *borrow checker* (borrowck), which tracks
|
||||
mutability and outstanding loans. This analysis can in principle be done without
|
||||
any help locally. However as soon as data starts crossing the function boundary,
|
||||
we have some serious trouble. In principle, borrowck could be a massive
|
||||
whole-program analysis engine to handle this problem, but this would be an
|
||||
atrocious solution. It would be terribly slow, and errors would be horribly
|
||||
non-local.
|
||||
|
||||
Instead, Rust tracks ownership through *lifetimes*. Every single reference and value
|
||||
in Rust is tagged with a lifetime that indicates the scope it is valid for.
|
||||
Rust has two kinds of reference:
|
||||
|
||||
* Shared reference: `&`
|
||||
* Mutable reference: `&mut`
|
||||
|
||||
The main rules are as follows:
|
||||
|
||||
* A shared reference can be aliased
|
||||
* A mutable reference cannot be aliased
|
||||
* A reference cannot outlive its referrent (`&'a T -> T: 'a`)
|
||||
|
||||
However non-mutable variables have some special rules:
|
||||
|
||||
* You cannot mutate or mutably borrow a non-mut variable,
|
||||
|
||||
Only variables marked as mutable can be borrowed mutably, though this is little
|
||||
more than a local lint against incorrect usage of a value.
|
||||
|
||||
|
||||
|
||||
|
||||
## Weird Lifetimes
|
||||
|
||||
Almost always, the mutability of a lifetime can be derived from the mutability
|
||||
of the reference it is attached to. However this is not necessarily the case.
|
||||
For instance in the following code:
|
||||
|
||||
```rust
|
||||
fn foo<'a>(input: &'a mut u8) -> &'a u8 { &* input }
|
||||
```
|
||||
|
||||
One would expect the output of foo to be an immutable lifetime. However we have
|
||||
derived it from the input, which is a mutable lifetime. So although we have a
|
||||
shared reference, it will have the much more limited aliasing rules of a mutable
|
||||
reference. As a consequence, there is no expressive benefit in a method that
|
||||
mutates returning a shared reference.
|
||||
|
||||
|
||||
|
||||
|
||||
## Lifetime Elision
|
||||
|
||||
In order to make common patterns more ergonomic, Rust allows lifetimes to be
|
||||
*elided* in function, impl, and type signatures.
|
||||
|
||||
A *lifetime position* is anywhere you can write a lifetime in a type:
|
||||
|
||||
```rust
|
||||
&'a T
|
||||
&'a mut T
|
||||
T<'a>
|
||||
```
|
||||
|
||||
Lifetime positions can appear as either "input" or "output":
|
||||
|
||||
* For `fn` definitions, input refers to the types of the formal arguments
|
||||
in the `fn` definition, while output refers to
|
||||
result types. So `fn foo(s: &str) -> (&str, &str)` has elided one lifetime in
|
||||
input position and two lifetimes in output position.
|
||||
Note that the input positions of a `fn` method definition do not
|
||||
include the lifetimes that occur in the method's `impl` header
|
||||
(nor lifetimes that occur in the trait header, for a default method).
|
||||
|
||||
* In the future, it should be possible to elide `impl` headers in the same manner.
|
||||
|
||||
Elision rules are as follows:
|
||||
|
||||
* Each elided lifetime in input position becomes a distinct lifetime
|
||||
parameter.
|
||||
|
||||
* If there is exactly one input lifetime position (elided or not), that lifetime
|
||||
is assigned to *all* elided output lifetimes.
|
||||
|
||||
* If there are multiple input lifetime positions, but one of them is `&self` or
|
||||
`&mut self`, the lifetime of `self` is assigned to *all* elided output lifetimes.
|
||||
|
||||
* Otherwise, it is an error to elide an output lifetime.
|
||||
|
||||
Examples:
|
||||
|
||||
```rust
|
||||
fn print(s: &str); // elided
|
||||
fn print<'a>(s: &'a str); // expanded
|
||||
|
||||
fn debug(lvl: uint, s: &str); // elided
|
||||
fn debug<'a>(lvl: uint, s: &'a str); // expanded
|
||||
|
||||
fn substr(s: &str, until: uint) -> &str; // elided
|
||||
fn substr<'a>(s: &'a str, until: uint) -> &'a str; // expanded
|
||||
|
||||
fn get_str() -> &str; // ILLEGAL
|
||||
|
||||
fn frob(s: &str, t: &str) -> &str; // ILLEGAL
|
||||
|
||||
fn get_mut(&mut self) -> &mut T; // elided
|
||||
fn get_mut<'a>(&'a mut self) -> &'a mut T; // expanded
|
||||
|
||||
fn args<T:ToCStr>(&mut self, args: &[T]) -> &mut Command // elided
|
||||
fn args<'a, 'b, T:ToCStr>(&'a mut self, args: &'b [T]) -> &'a mut Command // expanded
|
||||
|
||||
fn new(buf: &mut [u8]) -> BufWriter; // elided
|
||||
fn new<'a>(buf: &'a mut [u8]) -> BufWriter<'a> // expanded
|
||||
|
||||
```
|
||||
|
||||
|
||||
|
||||
## Unbounded Lifetimes
|
||||
|
||||
Unsafe code can often end up producing references or lifetimes out of thin air.
|
||||
Such lifetimes come into the world as *unbounded*. The most common source of this
|
||||
is derefencing a raw pointer, which produces a reference with an unbounded lifetime.
|
||||
Such a lifetime becomes as big as context demands. This is in fact more powerful
|
||||
than simply becoming `'static`, because for instance `&'static &'a T`
|
||||
will fail to typecheck, but the unbound lifetime will perfectly mold into
|
||||
`&'a &'a T` as needed. However for most intents and purposes, such an unbounded
|
||||
lifetime can be regarded as `'static`.
|
||||
|
||||
Almost no reference is `'static`, so this is probably wrong. `transmute` and
|
||||
`transmute_copy` are the two other primary offenders. One should endeavour to
|
||||
bound an unbounded lifetime as quick as possible, especially across function
|
||||
boundaries.
|
||||
|
||||
Given a function, any output lifetimes that don't derive from inputs are
|
||||
unbounded. For instance:
|
||||
|
||||
```
|
||||
fn get_str<'a>() -> &'a str;
|
||||
```
|
||||
|
||||
will produce an `&str` with an unbounded lifetime. The easiest way to avoid
|
||||
unbounded lifetimes is to use lifetime elision at the function boundary.
|
||||
If an output lifetime is elided, then it *must* be bounded by an input lifetime.
|
||||
Of course, it might be bounded by the *wrong* lifetime, but this will usually
|
||||
just cause a compiler error, rather than allow memory safety to be trivially
|
||||
violated.
|
||||
|
||||
Within a function, bounding lifetimes is more error-prone. The safest route
|
||||
is to just use a small function to ensure the lifetime is bound. However if
|
||||
this is unacceptable, the reference can be placed in a location with a specific
|
||||
lifetime. Unfortunately it's impossible to name all lifetimes involved in a
|
||||
function. To get around this, you can in principle use `copy_lifetime`, though
|
||||
these are unstable due to their awkward nature and questionable utility.
|
||||
|
||||
|
||||
|
||||
|
||||
## Subtyping and Variance
|
||||
|
||||
Although Rust doesn't have any notion of inheritance, it *does* include subtyping.
|
||||
In Rust, subtyping derives entirely from *lifetimes*. Since lifetimes are derived
|
||||
from scopes, we can partially order them based on an *outlives* relationship. We
|
||||
can even express this as a generic bound: `T: 'a` specifies that `T` *outlives* `'a`.
|
||||
|
||||
We can then define subtyping on lifetimes in terms of lifetimes: `'a : 'b` implies
|
||||
`'a <: b` -- if `'a' outlives `'b`, then `'a` is a subtype of `'b`. This is a very
|
||||
large source of confusion, because a bigger scope is a *sub type* of a smaller scope.
|
||||
This does in fact make sense. The intuitive reason for this is that if you expect an
|
||||
`&'a u8`, then it's totally fine for me to hand you an `&'static u8`, in the same way
|
||||
that if you expect an Animal in Java, it's totally fine for me to hand you a Cat.
|
||||
|
||||
Variance is where things get really harsh.
|
||||
|
||||
Variance is a property that *type constructors* have. A type constructor in Rust
|
||||
is a generic type with unbound arguments. For instance `Vec` is a type constructor
|
||||
that takes a `T` and returns a `Vec<T>`. `&` and `&mut` are type constructors that
|
||||
take a lifetime and a type.
|
||||
|
||||
A type constructor's *variance* is how the subtypes of its inputs affects the
|
||||
subtypes of its outputs. There are three kinds of variance:
|
||||
|
||||
* F is *covariant* if `T <: U` implies `F<T> <: F<U>`
|
||||
* F is *contravariant* if `T <: U` implies `F<U> <: F<T>`
|
||||
* F is *invariant* otherwise (no subtyping relation can be derived)
|
||||
|
||||
Some important variances:
|
||||
|
||||
* `&` is covariant (as is *const by metaphor)
|
||||
* `&mut` is invariant (as is *mut by metaphor)
|
||||
* `Fn(T)` is contravariant with respect to `T`
|
||||
* `Box`, `Vec`, and all other collections are covariant
|
||||
* `UnsafeCell`, `Cell`, `RefCell`, `Mutex` and all "interior mutability"
|
||||
types are invariant
|
||||
|
||||
To understand why these variances are correct and desirable, we will consider several
|
||||
examples. We have already covered why `&` should be covariant.
|
||||
|
||||
To see why `&mut` should be invariant, consider the following code:
|
||||
|
||||
```rust
|
||||
fn main() {
|
||||
let mut forever_str: &'static str = "hello";
|
||||
{
|
||||
let string = String::from("world");
|
||||
overwrite(&mut forever_str, &mut &*string);
|
||||
}
|
||||
println!("{}", forever_str);
|
||||
}
|
||||
|
||||
fn overwrite<T: Copy>(input: &mut T, new: &mut T) {
|
||||
*input = *new;
|
||||
}
|
||||
```
|
||||
|
||||
The signature of `overwrite` is clearly valid: it takes mutable references to two values
|
||||
of the same type, and replaces one with the other. We have seen already that `&` is
|
||||
covariant, and `'static` is a subtype of *any* `'a', so `&'static str` is a
|
||||
subtype of `&'a str`. Therefore, if `&mut` was
|
||||
*also* covariant, then the lifetime of the `&'static str` would successfully be
|
||||
"shrunk" down to the shorter lifetime of the string, and `replace` would be
|
||||
called successfully. The string would subsequently be dropped, and `forever_str`
|
||||
would point to freed memory when we print it!
|
||||
|
||||
Therefore `&mut` should be invariant. This is the general theme of covariance vs
|
||||
invariance: if covariance would allow you to *store* a short-lived value in a
|
||||
longer-lived slot, then you must be invariant.
|
||||
|
||||
`Box` and `Vec` are interesting cases because they're covariant, but you can
|
||||
definitely store values in them! This is fine because *you can only store values
|
||||
in them through a mutable reference*! The mutable reference makes the whole type
|
||||
invariant, and therefore prevents you from getting in trouble.
|
||||
|
||||
Being covariant allows them to be covariant when shared immutably (so you can pass
|
||||
a `&Box<&'static str>` where a `&Box<&'a str>` is expected). It also allows you to
|
||||
forever weaken the type by moving it into a weaker slot. That is, you can do:
|
||||
|
||||
```rust
|
||||
fn get_box<'a>(&'a u8) -> Box<&'a str> {
|
||||
Box::new("hello")
|
||||
}
|
||||
```
|
||||
|
||||
which is fine because unlike the mutable borrow case, there's no one else who
|
||||
"remembers" the old lifetime in the box.
|
||||
|
||||
The variance of the cell types similarly follows. `&` is like an `&mut` for a
|
||||
cell, because you can still store values in them through an `&`. Therefore cells
|
||||
must be invariant to avoid lifetime smuggling.
|
||||
|
||||
`Fn` is the most confusing case, largely because contravariance is easily the
|
||||
most confusing kind of variance, and basically never comes up. To understand it,
|
||||
consider a function that *takes* a function `len` that takes a function `F`.
|
||||
|
||||
```rust
|
||||
fn len<F>(func: F) -> usize
|
||||
where F: Fn(&'static str) -> usize
|
||||
{
|
||||
func("hello")
|
||||
}
|
||||
```
|
||||
|
||||
We require that F is a Fn that can take an `&'static str` and print a usize. Now
|
||||
say we have a function that can take an `&'a str` (for *some* 'a). Such a function actually
|
||||
accepts *more* inputs, since `&'static str` is a subtype of `&'a str`. Therefore
|
||||
`len` should happily accept such a function!
|
||||
|
||||
So a `Fn(&'a str)` is a subtype of a `Fn(&'static str)` because
|
||||
`&'static str` is a subtype of `&'a str`. Exactly contravariance.
|
||||
|
||||
The variance of `*const` and `*mut` is basically arbitrary as they're not at all
|
||||
type or memory safe, so their variance is determined in analogy to & and &mut
|
||||
respectively.
|
||||
|
||||
|
||||
|
||||
|
||||
## PhantomData and PhantomFn
|
||||
|
||||
This is all well and good for the types the standard library provides, but
|
||||
how is variance determined for type that *you* define? The variance of a type
|
||||
over its generic arguments is determined by how they're stored.
|
||||
|
||||
```rust
|
||||
struct Foo<'a, 'b, A, B, C, D, E, F, G, H> {
|
||||
a: &'a A, // covariant over 'a and A
|
||||
b: &'b mut B, // invariant over 'b and B
|
||||
c: *const C, // covariant over C
|
||||
d: *mut D, // invariant over D
|
||||
e: Vec<E>, // covariant over E
|
||||
f: Cell<F>, // invariant over F
|
||||
g: G // covariant over G
|
||||
h1: H // would also be covariant over H except...
|
||||
h2: Cell<H> // invariant over H, because invariance wins
|
||||
}
|
||||
```
|
||||
|
||||
However when working with unsafe code, we can often end up in a situation where
|
||||
types or lifetimes are logically associated with a struct, but not actually
|
||||
reachable. This most commonly occurs with lifetimes. For instance, the `Iter`
|
||||
for `&'a [T]` is (approximately) defined as follows:
|
||||
|
||||
```
|
||||
pub struct Iter<'a, T: 'a> {
|
||||
ptr: *const T,
|
||||
end: *const T,
|
||||
}
|
||||
```
|
||||
|
||||
However because `'a` is unused within the struct's body, it's *unbound*.
|
||||
Because of the troubles this has historically caused, unbound lifetimes and
|
||||
types are *illegal* in struct definitions. Therefore we must somehow refer
|
||||
to these types in the body.
|
||||
|
||||
We do this using *PhantomData*, which is a special marker type. PhantomData
|
||||
consumes no space, but simulates a field of the given type for the purpose of
|
||||
variance. This was deemed to be less error-prone than explicitly telling the
|
||||
type-system the kind of variance that you want.
|
||||
|
||||
Iter logically contains `&'a T`, so this is exactly what we tell
|
||||
the PhantomData to simulate:
|
||||
|
||||
```
|
||||
pub struct Iter<'a, T: 'a> {
|
||||
ptr: *const T,
|
||||
end: *const T,
|
||||
_marker: marker::PhantomData<&'a T>,
|
||||
}
|
||||
```
|
||||
|
||||
|
||||
## Splitting Lifetimes
|
||||
|
||||
The mutual exclusion property of mutable references can be very limiting when
|
||||
working with a composite structure. Borrowck understands some basic stuff, but
|
||||
will fall over pretty easily. Borrowck understands structs sufficiently to
|
||||
understand that it's possible to borrow disjoint fields of a struct simultaneously.
|
||||
So this works today:
|
||||
|
||||
```rust
|
||||
struct Foo {
|
||||
a: i32,
|
||||
b: i32,
|
||||
c: i32,
|
||||
}
|
||||
|
||||
let mut x = Foo {a: 0, b: 0, c: 0};
|
||||
let a = &mut x.a;
|
||||
let b = &mut x.b;
|
||||
let c = &x.c;
|
||||
*b += 1;
|
||||
let c2 = &x.c;
|
||||
*a += 10;
|
||||
println!("{} {} {} {}", a, b, c, c2);
|
||||
```
|
||||
|
||||
However borrowck doesn't understand arrays or slices in any way, so this doesn't
|
||||
work:
|
||||
|
||||
```rust
|
||||
let x = [1, 2, 3];
|
||||
let a = &mut x[0];
|
||||
let b = &mut x[1];
|
||||
println!("{} {}", a, b);
|
||||
```
|
||||
|
||||
```text
|
||||
<anon>:3:18: 3:22 error: cannot borrow immutable indexed content `x[..]` as mutable
|
||||
<anon>:3 let a = &mut x[0];
|
||||
^~~~
|
||||
<anon>:4:18: 4:22 error: cannot borrow immutable indexed content `x[..]` as mutable
|
||||
<anon>:4 let b = &mut x[1];
|
||||
^~~~
|
||||
error: aborting due to 2 previous errors
|
||||
```
|
||||
|
||||
While it was plausible that borrowck could understand this simple case, it's
|
||||
pretty clearly hopeless for borrowck to understand disjointness in general
|
||||
container types like a tree, especially if distinct keys actually *do* map
|
||||
to the same value.
|
||||
|
||||
In order to "teach" borrowck that what we're doing is ok, we need to drop down
|
||||
to unsafe code. For instance, mutable slices expose a `split_at_mut` function that
|
||||
consumes the slice and returns *two* mutable slices. One for everything to the
|
||||
left of the index, and one for everything to the right. Intuitively we know this
|
||||
is safe because the slices don't alias. However the implementation requires some
|
||||
unsafety:
|
||||
|
||||
```rust
|
||||
fn split_at_mut(&mut self, mid: usize) -> (&mut [T], &mut [T]) {
|
||||
unsafe {
|
||||
let self2: &mut [T] = mem::transmute_copy(&self);
|
||||
|
||||
(ops::IndexMut::index_mut(self, ops::RangeTo { end: mid } ),
|
||||
ops::IndexMut::index_mut(self2, ops::RangeFrom { start: mid } ))
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
This is pretty plainly dangerous. We use transmute to duplicate the slice with an
|
||||
*unbounded* lifetime, so that it
|
||||
|
||||
|
||||
|
||||
* Splitting lifetimes into disjoint regions
|
||||
* Creating lifetimes from raw pointers
|
||||
*
|
Loading…
Reference in new issue