You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
nomicon/src/subtyping.md

448 lines
18 KiB

# Subtyping and Variance
Rust uses lifetimes to track the relationships between borrows and ownership.
However, a naive implementation of lifetimes would be either too restrictive,
3 years ago
or permit undefined behavior.
3 years ago
In order to allow flexible usage of lifetimes
while also preventing their misuse, Rust uses a combination of **Subtyping** and **Variance**.
3 years ago
## Subtyping
Subtyping is the idea that one type can be a *subtype* of another.
Let's define that `Sub` is a subtype of `Super` (we'll be using the notation `Sub: Super` throughout this chapter)
What this is suggesting to us is that the set of *requirements* that `Super` defines
are completely satisfied by `Sub`. `Sub` may then have more requirements.
3 years ago
An example of simple subtyping that exists in the language are [supertraits](https://doc.rust-lang.org/stable/book/ch19-03-advanced-traits.html?highlight=supertraits#using-supertraits-to-require-one-traits-functionality-within-another-trait)
```rust
use std::fmt;
trait OutlinePrint: fmt::Display {
fn outline_print(&self) {
todo!()
}
}
```
Here, we have that `OutlinePrint: fmt::Display` (`OutlinePrint` is a *subtype* of `Display`),
because it has all the requirements of `fmt::Display`, plus the `outline_print` function.
However, subtyping in traits is not that interesting in the case of Rust.
Here in the nomicon, we're going to focus more with how subtyping interacts with **lifetimes**
Take this example
```rust
fn debug<T: std::fmt::Debug>(a: T, b: T) {
println!("a = {:?} b = {:?}", a, b);
}
fn main() {
let hello: &'static str = "hello";
{
let world = String::from("world");
let world = &world; // 'b has a shorter lifetime than 'static
debug(hello, world);
}
}
```
In an overly restrictive implementation of lifetimes, since `a` and `b` have differeing lifetimes,
we might see the following error:
```text
error[E0308]: mismatched types
3 years ago
--> src/main.rs:10:16
|
10 | debug(hello, world);
| ^
| |
| expected `&'static str`, found struct `&'b str`
```
3 years ago
This is over-restrictive. In this case, what we want is to accept any type that lives *at least as long* as `'b`.
Let's try using subtyping with our lifetimes.
3 years ago
Let's define a lifetime to have the a simple set of requirements: `'a` defines a region of code in which a value will be alive.
Now that we have a defined set of requirements for lifetimes, we can define how they relate to each other.
`'a: 'b` if and only if `'a` defines a region of code that **completely contains** `'b`.
3 years ago
`'a` may define a region larger than `'b`, but that still fits our definition.
Going back to our example above, we can say that `'static: 'b`.
For now, let's accept the idea that subtypes of lifetimes can be transitive (more on this in [Variance](#variance)),
3 years ago
eg. `&'static str` is a subtype of `&'b str`, then we can let them coerce, and then the example above will compile
```rust
3 years ago
fn debug<T: std::fmt::Debug>(a: T, b: T) {
println!("a = {:?} b = {:?}", a, b);
}
3 years ago
fn main() {
let hello: &'static str = "hello";
3 years ago
{
let world = String::from("world");
let world = &world; // 'b has a shorter lifetime than 'static
debug(hello, world); // a silently converts from `&'static str` into `&'b str`
3 years ago
}
}
```
3 years ago
## Variance
3 years ago
Above, we glossed over the fact that `'static: 'b` implied that `&'static T: &'b T`. This uses a property known as variance.
It's not always as simple as this example though, to understand that let's try extend this example a bit
3 years ago
```rust,compile_fail
fn assign<T>(input: &mut T, val: T) {
*input = val;
}
fn main() {
let mut hello: &'static str = "hello";
3 years ago
{
let world = String::from("world");
assign(&mut hello, &world);
3 years ago
}
}
```
If this were to compile, this would have a memory bug.
3 years ago
If we were to expand this out, we'd see that we're trying to assign a `&'b str` into a `&'static str`,
but the problem is that as soon as `b` goes out of scope, `a` is now invalid, even though it's supposed to have a `'static` lifetime.
However, the implementation of `assign` is valid.
3 years ago
Therefore, this must mean that `&mut &'static str` should **not** a *subtype* of `&mut &'b str`,
even if `'static` is a subtype of `'b`.
3 years ago
Variance is the way that Rust defines the transitivity of subtypes through their *type constructor*.
A type constructor in Rust is any generic type with unbound arguments.
6 years ago
For instance `Vec` is a type constructor that takes a type `T` and returns
`Vec<T>`. `&` and `&mut` are type constructors that take two inputs: a
lifetime, and a type to point to.
> NOTE: For convenience we will often refer to `F<T>` as a type constructor just so
> that we can easily talk about `T`. Hopefully this is clear in context.
A type constructor F's *variance* is how the subtyping of its inputs affects the
subtyping of its outputs. There are three kinds of variance in Rust. Given two
types `Sub` and `Super`, where `Sub` is a subtype of `Super`:
3 years ago
* F is **covariant** if `F<Sub>` is a subtype of `F<Super>` (the subtype property is passed through)
* F is **contravariant** if `F<Super>` is a subtype of `F<Sub>` (the subtype property is "inverted")
* F is **invariant** otherwise (no subtyping relationship exists)
3 years ago
If we remember from the above examples,
it was ok for us to treat `&'a T` as a subtype of `&'b T` if `'a: 'b`,
therefore we can say that `&'a T` is *covariant* over `'a`.
3 years ago
Also, we saw that it was not ok for us to treat `&mut &'a T` as a subtype of `&mut &'b T`,
therefore we can say that `&mut T` is *invariant* over `T`
3 years ago
Here is a table of some other type constructors and their variances:
6 years ago
| | | 'a | T | U |
|---|-----------------|:---------:|:-----------------:|:---------:|
| | `&'a T ` | covariant | covariant | |
| | `&'a mut T` | covariant | invariant | |
| | `Box<T>` | | covariant | |
6 years ago
| | `Vec<T>` | | covariant | |
| | `UnsafeCell<T>` | | invariant | |
6 years ago
| | `Cell<T>` | | invariant | |
| | `fn(T) -> U` | | **contra**variant | covariant |
6 years ago
| | `*const T` | | covariant | |
| | `*mut T` | | invariant | |
Some of these can be explained simply in relation to the others:
* `Vec<T>` and all other owning pointers and collections follow the same logic as `Box<T>`
* `Cell<T>` and all other interior mutability types follow the same logic as `UnsafeCell<T>`
* `UnsafeCell<T>` having interior mutability gives it the same variance properties as `&mut T`
* `*const T` follows the logic of `&T`
* `*mut T` follows the logic of `&mut T` (or `UnsafeCell<T>`)
For more types, see the ["Variance" section][variance-table] on the reference.
[variance-table]: ../reference/subtyping.html#variance
> NOTE: the *only* source of contravariance in the language is the arguments to
> a function, which is why it really doesn't come up much in practice. Invoking
> contravariance involves higher-order programming with function pointers that
> take references with specific lifetimes (as opposed to the usual "any lifetime",
6 years ago
> which gets into higher rank lifetimes, which work independently of subtyping).
Now that we have some more formal understanding of variance,
let's go through some more examples in more detail.
```rust,compile_fail
fn assign<T>(input: &mut T, val: T) {
*input = val;
}
fn main() {
let mut hello: &'static str = "hello";
{
let world = String::from("world");
assign(&mut hello, &world);
}
}
```
And what do we get when we run this?
```text
error[E0597]: `world` does not live long enough
--> src/main.rs:9:28
|
6 | let mut hello: &'static str = "hello";
| ------------ type annotation requires that `world` is borrowed for `'static`
...
9 | assign(&mut hello, &world);
| ^^^^^^ borrowed value does not live long enough
10 | }
| - `world` dropped here while still borrowed
```
Good, it doesn't compile! Let's break down what's happening here in detail.
First let's look at the `assign` function:
```rust
fn assign<T>(input: &mut T, val: T) {
*input = val;
}
```
All it does is take a mutable reference and a value and overwrite the referent with it.
What's important about this function is that it creates a type equality constraint. It
clearly says in its signature the referent and the value must be the *exact same* type.
Meanwhile, in the caller we pass in `&mut &'static str` and `&'spike_str str`.
Because `&mut T` is invariant over `T`, the compiler concludes it can't apply any subtyping
to the first argument, and so `T` must be exactly `&'static str`.
This is counter to the `&T` case
```rust
fn debug<T: std::fmt::Debug>(a: T, b: T) {
println!("a = {:?} b = {:?}", a, b);
}
```
Where similarly `a` and `b` must have the same type `T`.
But since `&'a T` *is* covariant over `'a`, we are allowed to perform subtyping.
So the compiler decides that `&'static str` can become `&'b str` if and only if
`&'static str` is a subtype of `&'b str`, which will hold if `'static: 'b`.
This is true, so the compiler is happy to continue compiling this code.
---
First off, let's revisit the meowing dog example:
<!-- ignore: simplified code -->
```rust,ignore
fn evil_feeder(pet: &mut Animal) {
let spike: Dog = ...;
// `pet` is an Animal, and Dog is a subtype of Animal,
// so this should be fine, right..?
*pet = spike;
}
fn main() {
let mut mr_snuggles: Cat = ...;
evil_feeder(&mut mr_snuggles); // Replaces mr_snuggles with a Dog
mr_snuggles.meow(); // OH NO, MEOWING DOG!
}
```
If we look at our table of variances, we see that `&mut T` is *invariant* over `T`.
As it turns out, this completely fixes the issue! With invariance, the fact that
Cat is a subtype of Animal doesn't matter; `&mut Cat` still won't be a subtype of
`&mut Animal`. The static type checker will then correctly stop us from passing
a Cat into `evil_feeder`.
The soundness of subtyping is based on the idea that it's ok to forget unnecessary
details. But with references, there's always someone that remembers those details:
the value being referenced. That value expects those details to keep being true,
and may behave incorrectly if its expectations are violated.
The problem with making `&mut T` covariant over `T` is that it gives us the power
to modify the original value *when we don't remember all of its constraints*.
And so, we can make someone have a Dog when they're certain they still have a Cat.
With that established, we can easily see why `&T` being covariant over `T` *is*
6 years ago
sound: it doesn't let you modify the value, only look at it. Without any way to
mutate, there's no way for us to mess with any details. We can also see why
6 years ago
`UnsafeCell` and all the other interior mutability types must be invariant: they
make `&T` work like `&mut T`!
Now what about the lifetime on references? Why is it ok for both kinds of references
6 years ago
to be covariant over their lifetimes? Well, here's a two-pronged argument:
First and foremost, subtyping references based on their lifetimes is *the entire point
of subtyping in Rust*. The only reason we have subtyping is so we can pass
long-lived things where short-lived things are expected. So it better work!
6 years ago
Second, and more seriously, lifetimes are only a part of the reference itself. The
type of the referent is shared knowledge, which is why adjusting that type in only
one place (the reference) can lead to issues. But if you shrink down a reference's
lifetime when you hand it to someone, that lifetime information isn't shared in
any way. There are now two independent references with independent lifetimes.
2 years ago
There's no way to mess with the original reference's lifetime using the other one.
6 years ago
Or rather, the only way to mess with someone's lifetime is to build a meowing dog.
But as soon as you try to build a meowing dog, the lifetime should be wrapped up
in an invariant type, preventing the lifetime from being shrunk. To understand this
better, let's port the meowing dog problem over to real Rust.
In the meowing dog problem we take a subtype (Cat), convert it into a supertype
(Animal), and then use that fact to overwrite the subtype with a value that satisfies
the constraints of the supertype but not the subtype (Dog).
So with lifetimes, we want to take a long-lived thing, convert it into a
short-lived thing, and then use that to write something that doesn't live long
enough into the place expecting something long-lived.
Here it is:
6 years ago
The other argument is only an `&'a str`, which *is* covariant over `'a`. So the compiler
adopts a constraint: `&'spike_str str` must be a subtype of `&'static str` (inclusive),
which in turn implies `'spike_str` must be a subtype of `'static` (inclusive). Which is to say,
`'spike_str` must contain `'static`. But only one thing contains `'static` -- `'static` itself!
6 years ago
This is why we get an error when we try to assign `&spike` to `spike_str`. The
compiler has worked backwards to conclude `spike_str` must live forever, and `&spike`
simply can't live that long.
So even though references are covariant over their lifetimes, they "inherit" invariance
whenever they're put into a context that could do something bad with that. In this case,
we inherited invariance as soon as we put our reference inside an `&mut T`.
As it turns out, the argument for why it's ok for Box (and Vec, Hashmap, etc.) to
6 years ago
be covariant is pretty similar to the argument for why it's ok for
references to be covariant: as soon as you try to stuff them in something like a
mutable reference, they inherit invariance and you're prevented from doing anything
6 years ago
bad.
However, Box makes it easier to focus on the by-value aspect of references that we
partially glossed over.
Unlike a lot of languages which allow values to be freely aliased at all times,
Rust has a very strict rule: if you're allowed to mutate or move a value, you
are guaranteed to be the only one with access to it.
Consider the following code:
<!-- ignore: simplified code -->
```rust,ignore
let mr_snuggles: Box<Cat> = ..;
let spike: Box<Dog> = ..;
let mut pet: Box<Animal>;
pet = mr_snuggles;
pet = spike;
```
There is no problem at all with the fact that we have forgotten that `mr_snuggles` was a Cat,
or that we overwrote him with a Dog, because as soon as we moved mr_snuggles to a variable
6 years ago
that only knew he was an Animal, **we destroyed the only thing in the universe that
remembered he was a Cat**!
In contrast to the argument about immutable references being soundly covariant because they
don't let you change anything, owned values can be covariant because they make you
change *everything*. There is no connection between old locations and new locations.
Applying by-value subtyping is an irreversible act of knowledge destruction, and
without any memory of how things used to be, no one can be tricked into acting on
that old information!
6 years ago
Only one thing left to explain: function pointers.
To see why `fn(T) -> U` should be covariant over `U`, consider the following signature:
<!-- ignore: simplified code -->
```rust,ignore
fn get_animal() -> Animal;
```
This function claims to produce an Animal. As such, it is perfectly valid to
provide a function with the following signature instead:
<!-- ignore: simplified code -->
```rust,ignore
fn get_animal() -> Cat;
```
After all, Cats are Animals, so always producing a Cat is a perfectly valid way
to produce Animals. Or to relate it back to real Rust: if we need a function
that is supposed to produce something that lives for `'short`, it's perfectly
fine for it to produce something that lives for `'long`. We don't care, we can
just forget that fact.
However, the same logic does not apply to *arguments*. Consider trying to satisfy:
<!-- ignore: simplified code -->
```rust,ignore
fn handle_animal(Animal);
```
with:
<!-- ignore: simplified code -->
```rust,ignore
fn handle_animal(Cat);
```
The first function can accept Dogs, but the second function absolutely can't.
Covariance doesn't work here. But if we flip it around, it actually *does*
work! If we need a function that can handle Cats, a function that can handle *any*
Animal will surely work fine. Or to relate it back to real Rust: if we need a
function that can handle anything that lives for at least `'long`, it's perfectly
fine for it to be able to handle anything that lives for at least `'short`.
And that's why function types, unlike anything else in the language, are
**contra**variant over their arguments.
Now, this is all well and good for the types the standard library provides, but
how is variance determined for types that *you* define? A struct, informally
speaking, inherits the variance of its fields. If a struct `MyType`
has a generic argument `A` that is used in a field `a`, then MyType's variance
6 years ago
over `A` is exactly `a`'s variance over `A`.
However if `A` is used in multiple fields:
* If all uses of `A` are covariant, then MyType is covariant over `A`
* If all uses of `A` are contravariant, then MyType is contravariant over `A`
* Otherwise, MyType is invariant over `A`
```rust
use std::cell::Cell;
struct MyType<'a, 'b, A: 'a, B: 'b, C, D, E, F, G, H, In, Out, Mixed> {
a: &'a A, // covariant over 'a and A
b: &'b mut B, // covariant over 'b and invariant over B
c: *const C, // covariant over C
d: *mut D, // invariant over D
e: E, // covariant over E
f: Vec<F>, // covariant over F
g: Cell<G>, // invariant over G
h1: H, // would also be covariant over H except...
h2: Cell<H>, // invariant over H, because invariance wins all conflicts
i: fn(In) -> Out, // contravariant over In, covariant over Out
k1: fn(Mixed) -> usize, // would be contravariant over Mixed except..
k2: Mixed, // invariant over Mixed, because invariance wins all conflicts
}
```