pull/10/head
Alexis Beingessner 10 years ago committed by Manish Goregaokar
parent 8e11d0e637
commit 736f072c13

@ -1 +1,3 @@
# turpl # The Unsafe Rust Programming Language (Book)
[Start at the intro](http://www.cglab.ca/~abeinges/blah/turpl/intro.html)

@ -5,8 +5,6 @@ are just there to help us use those bits right. Needing to reinterpret those pil
of bits as different types is a common problem and Rust consequently gives you of bits as different types is a common problem and Rust consequently gives you
several ways to do that. several ways to do that.
# Safe Rust
First we'll look at the ways that *Safe Rust* gives you to reinterpret values. The First we'll look at the ways that *Safe Rust* gives you to reinterpret values. The
most trivial way to do this is to just destructure a value into its constituent most trivial way to do this is to just destructure a value into its constituent
parts and then build a new type out of them. e.g. parts and then build a new type out of them. e.g.
@ -31,42 +29,191 @@ fn reinterpret(foo: Foo) -> Bar {
But this is, at best, annoying to do. For common conversions, rust provides But this is, at best, annoying to do. For common conversions, rust provides
more ergonomic alternatives. more ergonomic alternatives.
## Auto-Deref
# Auto-Deref
(Maybe nix this in favour of receiver coercions)
Deref is a trait that allows you to overload the unary `*` to specify a type Deref is a trait that allows you to overload the unary `*` to specify a type
you dereference to. This is largely only intended to be implemented by pointer you dereference to. This is largely only intended to be implemented by pointer
types like `&`, `Box`, and `Rc`. The dot operator will automatically perform types like `&`, `Box`, and `Rc`. The dot operator will automatically perform
automatic dereferencing, so that foo.bar() will work uniformly on `Foo`, `&Foo`, `&&Foo`, automatic dereferencing, so that foo.bar() will work uniformly on `Foo`, `&Foo`, `
`&Rc<Box<&mut&Box<Foo>>>` and so-on. Search bottoms out on the *first* match, &&Foo`, `&Rc<Box<&mut&Box<Foo>>>` and so-on. Search bottoms out on the *first* match,
so implementing methods on pointers is generally to be avoided, as it will shadow so implementing methods on pointers is generally to be avoided, as it will shadow
"actual" methods. "actual" methods.
## Coercions
Types can implicitly be coerced to change in certain contexts. These changes are generally
just *weakening* of types, largely focused around pointers. They mostly exist to make
Rust "just work" in more cases. For instance
`&mut T` coerces to `&T`, and `&T` coerces to `*const T`. The most useful coercion you will
actually think about it is probably the general *Deref Coercion*: `&T` coerces to `&U` when
`T: Deref<U>`. This enables us to pass an `&String` where an `&str` is expected, for instance.
## Casts
Casts are a superset of coercions: every coercion can be explicitly invoked via a cast, # Coercions
but some changes require a cast. These "true casts" are generally regarded as dangerous or
problematic actions. True casts revolves around raw pointers and the primitive numeric Types can implicitly be coerced to change in certain contexts. These changes are
types. Here's an exhaustive list of all the true casts: generally just *weakening* of types, largely focused around pointers and lifetimes.
They mostly exist to make Rust "just work" in more cases, and are largely harmless.
Here's all the kinds of coercion:
Coercion is allowed between the following types:
* `T` to `U` if `T` is a [subtype](lifetimes.html#subtyping-and-variance)
of `U` (the 'identity' case);
* `T_1` to `T_3` where `T_1` coerces to `T_2` and `T_2` coerces to `T_3`
(transitivity case);
* `&mut T` to `&T`;
* `*mut T` to `*const T`;
* `&T` to `*const T`;
* `&mut T` to `*mut T`;
* `T` to `U` if `T` implements `CoerceUnsized<U>` (see below) and `T = Foo<...>`
and `U = Foo<...>`;
* From TyCtor(`T`) to TyCtor(coerce_inner(`T`));
where TyCtor(`T`) is one of `&T`, `&mut T`, `*const T`, `*mut T`, or `Box<T>`.
And where coerce_inner is defined as
* coerce_inner(`[T, ..n]`) = `[T]`;
* coerce_inner(`T`) = `U` where `T` is a concrete type which implements the
trait `U`;
* coerce_inner(`T`) = `U` where `T` is a sub-trait of `U`;
* coerce_inner(`Foo<..., T, ...>`) = `Foo<..., coerce_inner(T), ...>` where
`Foo` is a struct and only the last field has type `T` and `T` is not part of
the type of any other fields;
* coerce_inner(`(..., T)`) = `(..., coerce_inner(T))`.
Coercions only occur at a *coercion site*. Exhaustively, the coercion sites
are:
* In `let` statements where an explicit type is given: in `let _: U = e;`, `e`
is coerced to to have type `U`;
* In statics and consts, similarly to `let` statements;
* In argument position for function calls. The value being coerced is the actual
parameter and it is coerced to the type of the formal parameter. For example,
where `foo` is defined as `fn foo(x: U) { ... }` and is called with `foo(e);`,
`e` is coerced to have type `U`;
* Where a field of a struct or variant is instantiated. E.g., where `struct Foo
{ x: U }` and the instantiation is `Foo { x: e }`, `e` is coerced to to have
type `U`;
* The result of a function, either the final line of a block if it is not semi-
colon terminated or any expression in a `return` statement. For example, for
`fn foo() -> U { e }`, `e` is coerced to to have type `U`;
TODO: gank the RFC for sweet casts If the expression in one of these coercion sites is a coercion-propagating
expression, then the relevant sub-expressions in that expression are also
coercion sites. Propagation recurses from these new coercion sites. Propagating
expressions and their relevant sub-expressions are:
For number -> number casts, there are quite a few cases to consider: * array literals, where the array has type `[U, ..n]`, each sub-expression in
the array literal is a coercion site for coercion to type `U`;
* array literals with repeating syntax, where the array has type `[U, ..n]`, the
repeated sub-expression is a coercion site for coercion to type `U`;
* tuples, where a tuple is a coercion site to type `(U_0, U_1, ..., U_n)`, each
sub-expression is a coercion site for the respective type, e.g., the zero-th
sub-expression is a coercion site to `U_0`;
* the box expression, if the expression has type `Box<U>`, the sub-expression is
a coercion site to `U`;
* parenthesised sub-expressions (`(e)`), if the expression has type `U`, then
the sub-expression is a coercion site to `U`;
* blocks, if a block has type `U`, then the last expression in the block (if it
is not semicolon-terminated) is a coercion site to `U`. This includes blocks
which are part of control flow statements, such as `if`/`else`, if the block
has a known type.
Note that we do not perform coercions when matching traits (except for
receivers, see below). If there is an impl for some type `U` and `T` coerces to
`U`, that does not constitute an implementation for `T`. For example, the
following will not type check, even though it is OK to coerce `t` to `&T` and
there is an impl for `&T`:
```
struct T;
trait Trait {}
fn foo<X: Trait>(t: X) {}
impl<'a> Trait for &'a T {}
fn main() {
let t: &mut T = &mut T;
foo(t); //~ ERROR failed to find an implementation of trait Trait for &mut T
}
```
In a cast expression, `e as U`, the compiler will first attempt to coerce `e` to
`U`, only if that fails will the conversion rules for casts (see below) be
applied.
# Casts
Casts are a superset of coercions: every coercion can be explicitly invoked via a
cast, but some conversions *require* a cast. These "true casts" are generally regarded
as dangerous or problematic actions. True casts revolve around raw pointers and
the primitive numeric types. True casts aren't checked.
Here's an exhaustive list of all the true casts:
* `e` has type `T` and `T` coerces to `U`; *coercion-cast*
* `e` has type `*T`, `U` is `*U_0`, and either `U_0: Sized` or
unsize_kind(`T`) = unsize_kind(`U_0`); *ptr-ptr-cast*
* `e` has type `*T` and `U` is a numeric type, while `T: Sized`; *ptr-addr-cast*
* `e` is an integer and `U` is `*U_0`, while `U_0: Sized`; *addr-ptr-cast*
* `e` has type `T` and `T` and `U` are any numeric types; *numeric-cast*
* `e` is a C-like enum and `U` is an integer type; *enum-cast*
* `e` has type `bool` or `char` and `U` is an integer; *prim-int-cast*
* `e` has type `u8` and `U` is `char`; *u8-char-cast*
* `e` has type `&[T; n]` and `U` is `*const T`; *array-ptr-cast*
* `e` is a function pointer type and `U` has type `*T`,
while `T: Sized`; *fptr-ptr-cast*
* `e` is a function pointer type and `U` is an integer; *fptr-addr-cast*
where `&.T` and `*T` are references of either mutability,
and where unsize_kind(`T`) is the kind of the unsize info
in `T` - the vtable for a trait definition (e.g. `fmt::Display` or
`Iterator`, not `Iterator<Item=u8>`) or a length (or `()` if `T: Sized`).
Note that lengths are not adjusted when casting raw slices -
`T: *const [u16] as *const [u8]` creates a slice that only includes
half of the original memory.
Casting is not transitive, that is, even if `e as U1 as U2` is a valid
expression, `e as U2` is not necessarily so (in fact it will only be valid if
`U1` coerces to `U2`).
For numeric casts, there are quite a few cases to consider:
* casting between two integers of the same size (e.g. i32 -> u32) is a no-op * casting between two integers of the same size (e.g. i32 -> u32) is a no-op
* casting from a smaller integer to a bigger integer (e.g. u32 -> u8) will truncate * casting from a smaller integer to a bigger integer (e.g. u32 -> u8) will truncate
* casting from a larger integer to a smaller integer (e.g. u8 -> u32) will * casting from a larger integer to a smaller integer (e.g. u8 -> u32) will
* zero-extend if unsigned * zero-extend if the target is unsigned
* sign-extend if signed * sign-extend if the target is signed
* casting from a float to an integer will round the float towards zero. * casting from a float to an integer will:
* round the float towards zero if finite
* **NOTE: currently this will cause Undefined Behaviour if the rounded * **NOTE: currently this will cause Undefined Behaviour if the rounded
value cannot be represented by the target integer type**. This is a bug value cannot be represented by the target integer type**. This is a bug
and will be fixed. and will be fixed.
@ -86,18 +233,14 @@ well as interpret integers as addresses. However it is impossible to actually
`unsafe`. `unsafe`.
## Conversion Traits
For full formal specification of all the kinds of coercions and coercion sites, see:
https://github.com/rust-lang/rfcs/blob/master/text/0401-coercions.md # Conversion Traits
TODO
* Coercions
* Casts
* Conversion Traits (Into/As/...)
# Unsafe Rust # Transmuting Types
* raw ptr casts
* mem::transmute

@ -1,6 +1,11 @@
% The Unsafe Rust Programming Language % The Unsafe Rust Programming Language
This document seeks to complement [The Rust Programming Language][] (TRPL). **This document is about advanced functionality and low-level development practices
in the Rust Programming Language. Most of the things discussed won't matter
to the average Rust programmer. However if you wish to correctly write unsafe
code in Rust, this text contains invaluable information.**
This document seeks to complement [The Rust Programming Language Book][] (TRPL).
Where TRPL introduces the language and teaches the basics, TURPL dives deep into Where TRPL introduces the language and teaches the basics, TURPL dives deep into
the specification of the language, and all the nasty bits necessary to write the specification of the language, and all the nasty bits necessary to write
Unsafe Rust. TURPL does not assume you have read TRPL, but does assume you know Unsafe Rust. TURPL does not assume you have read TRPL, but does assume you know
@ -10,7 +15,7 @@ stack or heap, we will not explain the syntax.
# Sections # Chapters
* [Data Layout](data.html) * [Data Layout](data.html)
* [Ownership and Lifetimes](lifetimes.html) * [Ownership and Lifetimes](lifetimes.html)
@ -48,7 +53,6 @@ Rust is 100% safe by default. Even when you *opt out* of safety in Rust, it is a
action. In deciding to work with unchecked uninitialized memory, this does not action. In deciding to work with unchecked uninitialized memory, this does not
suddenly make dangling or null pointers a problem. When using unchecked indexing on `x`, suddenly make dangling or null pointers a problem. When using unchecked indexing on `x`,
one does not have to suddenly worry about indexing out of bounds on `y`. one does not have to suddenly worry about indexing out of bounds on `y`.
C and C++, by contrast, have pervasive unsafety baked into the language. Even the C and C++, by contrast, have pervasive unsafety baked into the language. Even the
modern best practices like `unique_ptr` have various safety pitfalls. modern best practices like `unique_ptr` have various safety pitfalls.
@ -93,9 +97,11 @@ To be more concrete, Rust cares about preventing the following things:
* Unwinding into an FFI function * Unwinding into an FFI function
* Causing a data race * Causing a data race
However libraries are free to declare arbitrary requirements if they could transitively That's it. That's all the Undefined Behaviour in Rust. Libraries are free to
cause memory safety issues. However Rust is otherwise quite permisive with respect to declare arbitrary requirements if they could transitively cause memory safety
other dubious operations. Rust considers it "safe" to: issues, but it all boils down to the above actions. Rust is otherwise
quite permisive with respect to other dubious operations. Rust considers it
"safe" to:
* Deadlock * Deadlock
* Leak memory * Leak memory
@ -106,9 +112,9 @@ other dubious operations. Rust considers it "safe" to:
However any program that does such a thing is *probably* incorrect. Rust just isn't However any program that does such a thing is *probably* incorrect. Rust just isn't
interested in modeling these problems, as they are much harder to prevent in general, interested in modeling these problems, as they are much harder to prevent in general,
and it's basically impossible to prevent incorrect programs from getting written. and it's literally impossible to prevent incorrect programs from getting written.
Their are several places `unsafe` can appear in Rust today, which can largely be There are several places `unsafe` can appear in Rust today, which can largely be
grouped into two categories: grouped into two categories:
* There are unchecked contracts here. To declare you understand this, I require * There are unchecked contracts here. To declare you understand this, I require
@ -126,7 +132,7 @@ unchecked contracts:
* On trait implementations, `unsafe` is declaring that the contract of the * On trait implementations, `unsafe` is declaring that the contract of the
`unsafe` trait has been upheld. `unsafe` trait has been upheld.
* On blocks, `unsafe` is declaring any unsafety from an unsafe * On blocks, `unsafe` is declaring any unsafety from an unsafe
operation to be handled, and therefore the parent function is safe. operation within to be handled, and therefore the parent function is safe.
There is also `#[unsafe_no_drop_flag]`, which is a special case that exists for There is also `#[unsafe_no_drop_flag]`, which is a special case that exists for
historical reasons and is in the process of being phased out. See the section on historical reasons and is in the process of being phased out. See the section on
@ -167,3 +173,109 @@ thread safety is a sort of fundamental thing that a program can't really guard
against locally (even by-value message passing still requires a notion Send). against locally (even by-value message passing still requires a notion Send).
# Working with unsafe
Rust generally only gives us the tools to talk about safety in a scoped and
binary manner. Unfortunately reality is significantly more complicated than that.
For instance, consider the following toy function:
```rust
fn do_idx(idx: usize, arr: &[u8]) -> Option<u8> {
if idx < arr.len() {
unsafe {
Some(*arr.get_unchecked(idx))
}
} else {
None
}
}
```
Clearly, this function is safe. We check that the index is in bounds, and if it
is, index into the array in an unchecked manner. But even in such a trivial
function, the scope of the unsafe block is questionable. Consider changing the
`<` to a `<=`:
```rust
fn do_idx(idx: usize, arr: &[u8]) -> Option<u8> {
if idx <= arr.len() {
unsafe {
Some(*arr.get_unchecked(idx))
}
} else {
None
}
}
```
This program is now unsound, an yet *we only modified safe code*. This is the
fundamental problem of safety: it's non-local. The soundness of our unsafe
operations necessarily depends on the state established by "safe" operations.
Although safety *is* modular (we *still* don't need to worry about about
unrelated safety issues like uninitialized memory), it quickly contaminates the
surrounding code.
Trickier than that is when we get into actual statefulness. Consider a simple
implementation of `Vec`:
```rust
// Note this defintion is insufficient. See the section on lifetimes.
struct Vec<T> {
ptr: *mut T,
len: usize,
cap: usize,
}
// Note this implementation does not correctly handle zero-sized types.
// We currently live in a nice imaginary world of only postive fixed-size
// types.
impl<T> Vec<T> {
fn new() -> Self {
Vec { ptr: heap::EMPTY, len: 0, cap: 0 }
}
fn push(&mut self, elem: T) {
if self.len == self.cap {
// not important for this example
self.reallocate();
}
unsafe {
ptr::write(self.ptr.offset(len as isize), elem);
self.len += 1;
}
}
fn pop(&mut self) -> Option<T> {
if self.len > 0 {
self.len -= 1;
unsafe {
Some(ptr::read(self.ptr.offset(self.len as isize)))
}
} else {
None
}
}
}
```
This code is simple enough to reasonably audit and verify. Now consider
adding the following method:
```rust
fn make_room(&mut self) {
// grow the capacity
self.cap += 1;
}
```
This code is safe, but it is also completely unsound. Changing the capacity
violates the invariants of Vec (that `cap` reflects the allocated space in the
Vec). This is not something the rest of `Vec` can guard against. It *has* to
trust the capacity field because there's no way to verify it.
`unsafe` does more than pollute a whole function: it pollutes a whole *module*.
Generally, the only bullet-proof way to limit the scope of unsafe code is at the
module boundary with privacy.

@ -13,7 +13,12 @@ point, really: Rust is about control. However we are not limited to just memory.
Pretty much every other system resource like a thread, file, or socket is exposed through Pretty much every other system resource like a thread, file, or socket is exposed through
this kind of API. this kind of API.
So, how does RAII work in Rust? Unlike C++, Rust does not come with a slew on builtin
# Constructors
Unlike C++, Rust does not come with a slew of builtin
kinds of constructor. There are no Copy, Default, Assignment, Move, or whatever constructors. kinds of constructor. There are no Copy, Default, Assignment, Move, or whatever constructors.
This largely has to do with Rust's philosophy of being explicit. This largely has to do with Rust's philosophy of being explicit.
@ -25,20 +30,26 @@ not happening in Rust (safely).
Assignment and copy constructors similarly don't exist because move semantics are the *default* Assignment and copy constructors similarly don't exist because move semantics are the *default*
in rust. At most `x = y` just moves the bits of y into the x variable. Rust does provide two in rust. At most `x = y` just moves the bits of y into the x variable. Rust does provide two
facilities for going back to C++'s copy-oriented semantics: `Copy` and `Clone`. Clone is our facilities for going back to C++'s copy-oriented semantics: `Copy` and `Clone`. Clone is our
moral equivalent of copy constructor, but it's never implicitly invoked. You have to explicitly moral equivalent of a copy constructor, but it's never implicitly invoked. You have to explicitly
call `clone` on an element you want to be cloned. Copy is a special case of Clone where the call `clone` on an element you want to be cloned. Copy is a special case of Clone where the
implementation is just "duplicate the bitwise representation". Copy types *are* implicitely implementation is just "copy the bits". Copy types *are* implicitly
cloned whenever they're moved, but because of the definition of Copy this just means *not* cloned whenever they're moved, but because of the definition of Copy this just means *not*
treating the old copy as uninitialized; a no-op. treating the old copy as uninitialized -- a no-op.
While Rust provides a `Default` trait for specifying the moral equivalent of a default While Rust provides a `Default` trait for specifying the moral equivalent of a default
constructor, it's incredibly rare for this trait to be used. This is because variables constructor, it's incredibly rare for this trait to be used. This is because variables
aren't implicitely initialized (see [working with uninitialized memory][uninit] for details). aren't implicitly initialized (see [working with uninitialized memory][uninit] for details).
Default is basically only useful for generic programming. Default is basically only useful for generic programming.
More often than not, in a concrete case a type will provide a static `new` method for any In concrete contexts, a type will provide a static `new` method for any
kind of "default" constructor. This has no relation to `new` in other languages and has no kind of "default" constructor. This has no relation to `new` in other
special meaning. It's just a naming convention. languages and has no special meaning. It's just a naming convention.
# Destructors
What the language *does* provide is full-blown automatic destructors through the `Drop` trait, What the language *does* provide is full-blown automatic destructors through the `Drop` trait,
which provides the following method: which provides the following method:
@ -49,12 +60,19 @@ fn drop(&mut self);
This method gives the type time to somehow finish what it was doing. **After `drop` is run, This method gives the type time to somehow finish what it was doing. **After `drop` is run,
Rust will recursively try to drop all of the fields of the `self` struct**. This is a Rust will recursively try to drop all of the fields of the `self` struct**. This is a
convenience feature so that you don't have to write "destructor boilerplate" dropping convenience feature so that you don't have to write "destructor boilerplate" to drop
children. **There is no way to prevent this in Rust 1.0**. Also note that `&mut self` means children. If a struct has no special logic for being dropped other than dropping its
that even if you *could* supress recursive Drop, Rust will prevent you from e.g. moving fields children, then it means `Drop` doesn't need to be implemented at all!
out of self. For most types, this is totally fine: they own all their data, there's no
additional state passed into drop to try to send it to, and `self` is about to be marked as **There is no way to prevent this behaviour in Rust 1.0**.
uninitialized (and therefore inaccessible).
Note that taking `&mut self` means that even if you *could* suppress recursive Drop,
Rust will prevent you from e.g. moving fields out of self. For most types, this
is totally fine:
* They own all their data (they don't contain pointers to elsewhere).
* There's no additional state passed into drop to try to send things.
* `self` is about to be marked as uninitialized (and therefore inaccessible).
For instance, a custom implementation of `Box` might write `Drop` like this: For instance, a custom implementation of `Box` might write `Drop` like this:
@ -73,7 +91,7 @@ impl<T> Drop for Box<T> {
and this works fine because when Rust goes to drop the `ptr` field it just sees a *mut that and this works fine because when Rust goes to drop the `ptr` field it just sees a *mut that
has no actual `Drop` implementation. Similarly nothing can use-after-free the `ptr` because has no actual `Drop` implementation. Similarly nothing can use-after-free the `ptr` because
the Box is completely gone. the Box is immediately marked as uninitialized.
However this wouldn't work: However this wouldn't work:
@ -130,14 +148,14 @@ enum Link {
} }
``` ```
will have its inner Box field dropped *if and only if* a value stores the Next variant. will have its inner Box field dropped *if and only if* an instance stores the Next variant.
In general this works really nice because you don't need to worry about adding/removing In general this works really nice because you don't need to worry about adding/removing
dtors when you refactor your data layout. Still there's certainly many valid usecases for drops when you refactor your data layout. Still there's certainly many valid usecases for
needing to do trickier things with destructors. needing to do trickier things with destructors.
The classic safe solution to blocking recursive drop semantics and allowing moving out The classic safe solution to overriding recursive drop and allowing moving out
of Self is to use an Option: of Self during `drop` is to use an Option:
```rust ```rust
struct Box<T>{ ptr: *mut T } struct Box<T>{ ptr: *mut T }
@ -158,22 +176,255 @@ impl<T> Drop for SuperBox<T> {
unsafe { unsafe {
// Hyper-optimized: deallocate the box's contents for it // Hyper-optimized: deallocate the box's contents for it
// without `drop`ing the contents. Need to set the `box` // without `drop`ing the contents. Need to set the `box`
// fields as `None` to prevent Rust from trying to Drop it. // field as `None` to prevent Rust from trying to Drop it.
heap::deallocate(self.box.take().unwrap().ptr); heap::deallocate(self.box.take().unwrap().ptr);
} }
} }
} }
``` ```
However this has fairly odd semantics: you're saying that a field that *should* always be Some However this has fairly odd semantics: you're saying that a field that *should* always
may be None, just because that happens in the dtor. Of course this conversely makes a lot of sense: be Some may be None, just because that happens in the destructor. Of course this
you can call arbitrary methods on self during the destructor, and this should prevent you from conversely makes a lot of sense: you can call arbitrary methods on self during
ever doing so after deinitializing the field. Not that it will prevent you from producing any other the destructor, and this should prevent you from ever doing so after deinitializing
the field. Not that it will prevent you from producing any other
arbitrarily invalid state in there. arbitrarily invalid state in there.
On balance this is an ok choice. Certainly if you're just getting started. On balance this is an ok choice. Certainly what you should reach for by default.
However, in the future we expect there to be a first-class way to announce that
a field shouldn't be automatically dropped.
# Leaking
Ownership based resource management is intended to simplify composition. You
acquire resources when you create the object, and you release the resources
when it gets destroyed. Since destruction is handled for you, it means you
can't forget to release the resources, and it happens as soon as possible!
Surely this is perfect and all of our problems are solved.
Everything is terrible and we have new and exotic problems to try to solve.
Many people like to believe that Rust eliminates resource leaks, but this
is absolutely not the case, no matter how you look at it. In the strictest
sense, "leaking" is so abstract as to be unpreventable. It's quite trivial
to initialize a collection at the start of a program, fill it with tons of
objects with destructors, and then enter an infinite event loop that never
refers to it. The collection will sit around uselessly, holding on to its
precious resources until the program terminates (at which point all those
resources would have been reclaimed by the OS anyway).
We may consider a more restricted form of leak: failing to free memory that
is unreachable. Rust also doesn't prevent this. In fact Rust has a *function
for doing this*: `mem::forget`. This function consumes the value it is passed
*and then doesn't run its destructor*.
In the past `mem::forget` was marked as unsafe as a sort of lint against using
it, since failing to call a destructor is generally not a well-behaved thing to
do (though useful for some special unsafe code). However this was generally
determined to be an untenable stance to take: there are *many* ways to fail to
call a destructor in safe code. The most famous example is creating a cycle
of reference counted pointers using interior mutability.
It is reasonable for safe code to assume that destructor leaks do not happen,
as any program that leaks destructors is probably wrong. However *unsafe* code
cannot rely on destructors to be run to be *safe*. For most types this doesn't
matter: if you leak the destructor then the type is *by definition* inaccessible,
so it doesn't matter, right? e.g. if you leak a `Box<u8>` then you waste some
memory but that's hardly going to violate memory-safety.
However where we must be careful with destructor leaks are *proxy* types.
These are types which manage access to a distinct object, but don't actually
own it. Proxy objects are quite rare. Proxy objects you'll need to care about
are even rarer. However we'll focus on two interesting examples in the
standard library:
* `vec::Drain`
* `Rc`
## Drain
`drain` is a collections API that moves data out of the container without
consuming the container. This enables us to reuse the allocation of a `Vec`
after claiming ownership over all of its contents. drain produces an iterator
(Drain) that returns the contents of the Vec by-value.
Now, consider Drain in the middle of iteration: some values have been moved out,
and others haven't. This means that part of the Vec is now full of logically
uninitialized data! We could backshift all the elements in the Vec every time we
remove a value, but this would have pretty catastrophic performance consequences.
Instead, we would like Drain to *fix* the Vec's backing storage when it is
dropped. It should run itself to completion, backshift any elements that weren't
removed (drain supports subranges), and then fix Vec's `len`. It's even
unwinding-safe! Easy!
Now consider the following:
```
let mut vec = vec![Box::new(0); 4];
{
// start draining, vec can no longer be accessed
let mut drainer = vec.drain(..);
// pull out two elements and immediately drop them
drainer.next();
drainer.next();
// get rid of drainer, but don't call its destructor
mem::forget(drainer);
}
// Oops, vec[0] was dropped, we're reading a pointer into free'd memory!
println!("{}", vec[0]);
```
This is pretty clearly Not Good. Unfortunately, we're kind've stuck between
a rock and a hard place: maintaining consistent state at every step has
an enormous cost (and would negate any benefits of the API). Failing to maintain
consistent state gives us Undefined Behaviour in safe code (making the API
unsound).
So what can we do? Well, we can pick a trivially consistent state: set the Vec's
len to be 0 when we *start* the iteration, and fix it up if necessary in the
destructor. That way, if everything executes like normal we get the desired
behaviour with minimal overhead. But if someone has the *audacity* to mem::forget
us in the middle of the iteration, all that does is *leak even more* (and possibly
leave the Vec in an *unexpected* but consistent state). Since we've
accepted that mem::forget is safe, this is definitely safe. We call leaks causing
more leaks a *leak amplification*.
## Rc
Rc is an interesting case because at first glance it doesn't appear to be a
proxy value at all. After all, it manages the data it points to, and dropping
all the Rcs for a value will drop that value. leaking an Rc doesn't seem like
it would be particularly dangerous. It will leave the refcount permanently
incremented and prevent the data from being freed or dropped, but that seems
just like Box, right?
Nope.
Let's consider a simplified implementation of Rc:
```rust
struct Rc<T> {
ptr: *mut RcBox<T>,
}
struct RcBox<T> {
data: T,
ref_count: usize,
}
impl<T> Rc<T> {
fn new(data: T) -> Self {
unsafe {
// Wouldn't it be nice if heap::allocate worked like this?
let ptr = heap::allocate<RcBox<T>>();
ptr::write(ptr, RcBox {
data: data,
ref_count: 1,
});
Rc { ptr: ptr }
}
}
fn clone(&self) -> Self {
unsafe {
(*self.ptr).ref_count += 1;
}
Rc { ptr: self.ptr }
}
}
impl<T> Drop for Rc<T> {
fn drop(&mut self) {
unsafe {
let inner = &mut ;
(*self.ptr).ref_count -= 1;
if (*self.ptr).ref_count == 0 {
// drop the data and then free it
ptr::read(self.ptr);
heap::deallocate(self.ptr);
}
}
}
}
```
This code contains an implicit and subtle assumption: ref_count can fit in a
`usize`, because there can't be more than `usize::MAX` Rcs in memory. However
this itself assumes that the ref_count accurately reflects the number of Rcs
in memory, which we know is false with mem::forget. Using mem::forget we can
overflow the ref_count, and then get it down to 0 with outstanding Rcs. Then we
can happily use-after-free the inner data. Bad Bad Not Good.
This can be solved by *saturating* the ref_count, which is sound because
decreasing the refcount by `n` still requires `n` Rcs simultaneously living
in memory.
## thread::scoped
The thread::scoped API intends to allow threads to be spawned that reference
data on the stack without any synchronization over that data. Usage looked like:
```rust
let mut data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
{
let guards = vec![];
for x in &mut data {
// Move the mutable reference into the closure, and execute
// it on a different thread. The closure has a lifetime bound
// by the lifetime of the mutable reference `x` we store in it.
// The guard that is returned is in turn assigned the lifetime
// of the closure, so it also mutably borrows `data` as `x` did.
// This means we cannot access `data` until the guard goes away.
let guard = thread::scoped(move || {
*x *= 2;
});
// store the thread's guard for later
guards.push(guard);
}
// All guards are dropped here, forcing the threads to join
// (this thread blocks here until the others terminate).
// Once the threads join, the borrow expires and the data becomes
// accessible again in this thread.
}
// data is definitely mutated here.
```
In principle, this totally works! Rust's ownership system perfectly ensures it!
...except it relies on a destructor being called to be safe.
```
let mut data = Box::new(0);
{
let guard = thread::scoped(|| {
// This is at best a data race. At worst, it's *also* a use-after-free.
*data += 1;
});
// Because the guard is forgotten, expiring the loan without blocking this
// thread.
mem::forget(guard);
}
// So the Box is dropped here while the scoped thread may or may not be trying
// to access it.
```
In the future, we expect there to be a first-class way to announce that a field Dang. Here the destructor running was pretty fundamental to the API, and it had
should be automatically dropped. to be scrapped in favour of a completely different design.
[uninit]: uninitialized.html [uninit]: uninitialized.html
Loading…
Cancel
Save