diff --git a/README.md b/README.md index 4227beb..e391c9a 100644 --- a/README.md +++ b/README.md @@ -1 +1,3 @@ -# turpl +# The Unsafe Rust Programming Language (Book) + +[Start at the intro](http://www.cglab.ca/~abeinges/blah/turpl/intro.html) \ No newline at end of file diff --git a/conversions.md b/conversions.md index 4fe8628..ae3ce9e 100644 --- a/conversions.md +++ b/conversions.md @@ -5,8 +5,6 @@ are just there to help us use those bits right. Needing to reinterpret those pil of bits as different types is a common problem and Rust consequently gives you several ways to do that. -# Safe Rust - First we'll look at the ways that *Safe Rust* gives you to reinterpret values. The most trivial way to do this is to just destructure a value into its constituent parts and then build a new type out of them. e.g. @@ -31,42 +29,191 @@ fn reinterpret(foo: Foo) -> Bar { But this is, at best, annoying to do. For common conversions, rust provides more ergonomic alternatives. -## Auto-Deref + + + +# Auto-Deref + +(Maybe nix this in favour of receiver coercions) Deref is a trait that allows you to overload the unary `*` to specify a type you dereference to. This is largely only intended to be implemented by pointer types like `&`, `Box`, and `Rc`. The dot operator will automatically perform -automatic dereferencing, so that foo.bar() will work uniformly on `Foo`, `&Foo`, `&&Foo`, -`&Rc>>` and so-on. Search bottoms out on the *first* match, +automatic dereferencing, so that foo.bar() will work uniformly on `Foo`, `&Foo`, ` +&&Foo`, `&Rc>>` and so-on. Search bottoms out on the *first* match, so implementing methods on pointers is generally to be avoided, as it will shadow "actual" methods. -## Coercions -Types can implicitly be coerced to change in certain contexts. These changes are generally -just *weakening* of types, largely focused around pointers. They mostly exist to make -Rust "just work" in more cases. For instance -`&mut T` coerces to `&T`, and `&T` coerces to `*const T`. The most useful coercion you will -actually think about it is probably the general *Deref Coercion*: `&T` coerces to `&U` when -`T: Deref`. This enables us to pass an `&String` where an `&str` is expected, for instance. -## Casts -Casts are a superset of coercions: every coercion can be explicitly invoked via a cast, -but some changes require a cast. These "true casts" are generally regarded as dangerous or -problematic actions. True casts revolves around raw pointers and the primitive numeric -types. Here's an exhaustive list of all the true casts: +# Coercions + +Types can implicitly be coerced to change in certain contexts. These changes are +generally just *weakening* of types, largely focused around pointers and lifetimes. +They mostly exist to make Rust "just work" in more cases, and are largely harmless. + +Here's all the kinds of coercion: + + +Coercion is allowed between the following types: + +* `T` to `U` if `T` is a [subtype](lifetimes.html#subtyping-and-variance) + of `U` (the 'identity' case); + +* `T_1` to `T_3` where `T_1` coerces to `T_2` and `T_2` coerces to `T_3` + (transitivity case); + +* `&mut T` to `&T`; + +* `*mut T` to `*const T`; + +* `&T` to `*const T`; + +* `&mut T` to `*mut T`; + +* `T` to `U` if `T` implements `CoerceUnsized` (see below) and `T = Foo<...>` + and `U = Foo<...>`; + +* From TyCtor(`T`) to TyCtor(coerce_inner(`T`)); + +where TyCtor(`T`) is one of `&T`, `&mut T`, `*const T`, `*mut T`, or `Box`. +And where coerce_inner is defined as + +* coerce_inner(`[T, ..n]`) = `[T]`; + +* coerce_inner(`T`) = `U` where `T` is a concrete type which implements the + trait `U`; + +* coerce_inner(`T`) = `U` where `T` is a sub-trait of `U`; + +* coerce_inner(`Foo<..., T, ...>`) = `Foo<..., coerce_inner(T), ...>` where + `Foo` is a struct and only the last field has type `T` and `T` is not part of + the type of any other fields; + +* coerce_inner(`(..., T)`) = `(..., coerce_inner(T))`. + +Coercions only occur at a *coercion site*. Exhaustively, the coercion sites +are: + +* In `let` statements where an explicit type is given: in `let _: U = e;`, `e` + is coerced to to have type `U`; + +* In statics and consts, similarly to `let` statements; + +* In argument position for function calls. The value being coerced is the actual + parameter and it is coerced to the type of the formal parameter. For example, + where `foo` is defined as `fn foo(x: U) { ... }` and is called with `foo(e);`, + `e` is coerced to have type `U`; + +* Where a field of a struct or variant is instantiated. E.g., where `struct Foo + { x: U }` and the instantiation is `Foo { x: e }`, `e` is coerced to to have + type `U`; + +* The result of a function, either the final line of a block if it is not semi- + colon terminated or any expression in a `return` statement. For example, for + `fn foo() -> U { e }`, `e` is coerced to to have type `U`; -TODO: gank the RFC for sweet casts +If the expression in one of these coercion sites is a coercion-propagating +expression, then the relevant sub-expressions in that expression are also +coercion sites. Propagation recurses from these new coercion sites. Propagating +expressions and their relevant sub-expressions are: -For number -> number casts, there are quite a few cases to consider: +* array literals, where the array has type `[U, ..n]`, each sub-expression in + the array literal is a coercion site for coercion to type `U`; + +* array literals with repeating syntax, where the array has type `[U, ..n]`, the + repeated sub-expression is a coercion site for coercion to type `U`; + +* tuples, where a tuple is a coercion site to type `(U_0, U_1, ..., U_n)`, each + sub-expression is a coercion site for the respective type, e.g., the zero-th + sub-expression is a coercion site to `U_0`; + +* the box expression, if the expression has type `Box`, the sub-expression is + a coercion site to `U`; + +* parenthesised sub-expressions (`(e)`), if the expression has type `U`, then + the sub-expression is a coercion site to `U`; + +* blocks, if a block has type `U`, then the last expression in the block (if it + is not semicolon-terminated) is a coercion site to `U`. This includes blocks + which are part of control flow statements, such as `if`/`else`, if the block + has a known type. + + +Note that we do not perform coercions when matching traits (except for +receivers, see below). If there is an impl for some type `U` and `T` coerces to +`U`, that does not constitute an implementation for `T`. For example, the +following will not type check, even though it is OK to coerce `t` to `&T` and +there is an impl for `&T`: + +``` +struct T; +trait Trait {} + +fn foo(t: X) {} + +impl<'a> Trait for &'a T {} + + +fn main() { + let t: &mut T = &mut T; + foo(t); //~ ERROR failed to find an implementation of trait Trait for &mut T +} +``` + +In a cast expression, `e as U`, the compiler will first attempt to coerce `e` to +`U`, only if that fails will the conversion rules for casts (see below) be +applied. + + + + +# Casts + +Casts are a superset of coercions: every coercion can be explicitly invoked via a +cast, but some conversions *require* a cast. These "true casts" are generally regarded +as dangerous or problematic actions. True casts revolve around raw pointers and +the primitive numeric types. True casts aren't checked. + +Here's an exhaustive list of all the true casts: + + * `e` has type `T` and `T` coerces to `U`; *coercion-cast* + * `e` has type `*T`, `U` is `*U_0`, and either `U_0: Sized` or + unsize_kind(`T`) = unsize_kind(`U_0`); *ptr-ptr-cast* + * `e` has type `*T` and `U` is a numeric type, while `T: Sized`; *ptr-addr-cast* + * `e` is an integer and `U` is `*U_0`, while `U_0: Sized`; *addr-ptr-cast* + * `e` has type `T` and `T` and `U` are any numeric types; *numeric-cast* + * `e` is a C-like enum and `U` is an integer type; *enum-cast* + * `e` has type `bool` or `char` and `U` is an integer; *prim-int-cast* + * `e` has type `u8` and `U` is `char`; *u8-char-cast* + * `e` has type `&[T; n]` and `U` is `*const T`; *array-ptr-cast* + * `e` is a function pointer type and `U` has type `*T`, + while `T: Sized`; *fptr-ptr-cast* + * `e` is a function pointer type and `U` is an integer; *fptr-addr-cast* + +where `&.T` and `*T` are references of either mutability, +and where unsize_kind(`T`) is the kind of the unsize info +in `T` - the vtable for a trait definition (e.g. `fmt::Display` or +`Iterator`, not `Iterator`) or a length (or `()` if `T: Sized`). + +Note that lengths are not adjusted when casting raw slices - +`T: *const [u16] as *const [u8]` creates a slice that only includes +half of the original memory. + +Casting is not transitive, that is, even if `e as U1 as U2` is a valid +expression, `e as U2` is not necessarily so (in fact it will only be valid if +`U1` coerces to `U2`). + +For numeric casts, there are quite a few cases to consider: * casting between two integers of the same size (e.g. i32 -> u32) is a no-op * casting from a smaller integer to a bigger integer (e.g. u32 -> u8) will truncate * casting from a larger integer to a smaller integer (e.g. u8 -> u32) will - * zero-extend if unsigned - * sign-extend if signed -* casting from a float to an integer will round the float towards zero. + * zero-extend if the target is unsigned + * sign-extend if the target is signed +* casting from a float to an integer will: + * round the float towards zero if finite * **NOTE: currently this will cause Undefined Behaviour if the rounded value cannot be represented by the target integer type**. This is a bug and will be fixed. @@ -86,18 +233,14 @@ well as interpret integers as addresses. However it is impossible to actually `unsafe`. -## Conversion Traits -For full formal specification of all the kinds of coercions and coercion sites, see: -https://github.com/rust-lang/rfcs/blob/master/text/0401-coercions.md + +# Conversion Traits + +TODO -* Coercions -* Casts -* Conversion Traits (Into/As/...) -# Unsafe Rust +# Transmuting Types -* raw ptr casts -* mem::transmute diff --git a/intro.md b/intro.md index 7d3036f..b955d78 100644 --- a/intro.md +++ b/intro.md @@ -1,6 +1,11 @@ % The Unsafe Rust Programming Language -This document seeks to complement [The Rust Programming Language][] (TRPL). +**This document is about advanced functionality and low-level development practices +in the Rust Programming Language. Most of the things discussed won't matter +to the average Rust programmer. However if you wish to correctly write unsafe +code in Rust, this text contains invaluable information.** + +This document seeks to complement [The Rust Programming Language Book][] (TRPL). Where TRPL introduces the language and teaches the basics, TURPL dives deep into the specification of the language, and all the nasty bits necessary to write Unsafe Rust. TURPL does not assume you have read TRPL, but does assume you know @@ -10,7 +15,7 @@ stack or heap, we will not explain the syntax. -# Sections +# Chapters * [Data Layout](data.html) * [Ownership and Lifetimes](lifetimes.html) @@ -48,7 +53,6 @@ Rust is 100% safe by default. Even when you *opt out* of safety in Rust, it is a action. In deciding to work with unchecked uninitialized memory, this does not suddenly make dangling or null pointers a problem. When using unchecked indexing on `x`, one does not have to suddenly worry about indexing out of bounds on `y`. - C and C++, by contrast, have pervasive unsafety baked into the language. Even the modern best practices like `unique_ptr` have various safety pitfalls. @@ -85,17 +89,19 @@ To be more concrete, Rust cares about preventing the following things: * Breaking the pointer aliasing rules (TBD) (llvm rules + noalias on &mut and & w/o UnsafeCell) * Invoking Undefined Behaviour (in e.g. compiler intrinsics) * Producing invalid primitive values: - * dangling/null references - * a `bool` that isn't 0 or 1 - * an undefined `enum` discriminant - * a `char` larger than char::MAX - * A non-utf8 `str` + * dangling/null references + * a `bool` that isn't 0 or 1 + * an undefined `enum` discriminant + * a `char` larger than char::MAX + * A non-utf8 `str` * Unwinding into an FFI function * Causing a data race -However libraries are free to declare arbitrary requirements if they could transitively -cause memory safety issues. However Rust is otherwise quite permisive with respect to -other dubious operations. Rust considers it "safe" to: +That's it. That's all the Undefined Behaviour in Rust. Libraries are free to +declare arbitrary requirements if they could transitively cause memory safety +issues, but it all boils down to the above actions. Rust is otherwise +quite permisive with respect to other dubious operations. Rust considers it +"safe" to: * Deadlock * Leak memory @@ -106,27 +112,27 @@ other dubious operations. Rust considers it "safe" to: However any program that does such a thing is *probably* incorrect. Rust just isn't interested in modeling these problems, as they are much harder to prevent in general, -and it's basically impossible to prevent incorrect programs from getting written. +and it's literally impossible to prevent incorrect programs from getting written. -Their are several places `unsafe` can appear in Rust today, which can largely be +There are several places `unsafe` can appear in Rust today, which can largely be grouped into two categories: * There are unchecked contracts here. To declare you understand this, I require you to write `unsafe` elsewhere: * On functions, `unsafe` is declaring the function to be unsafe to call. Users - of the function must check the documentation to determine what this means, - and then have to write `unsafe` somewhere to identify that they're aware of + of the function must check the documentation to determine what this means, + and then have to write `unsafe` somewhere to identify that they're aware of the danger. * On trait declarations, `unsafe` is declaring that *implementing* the trait - is an unsafe operation, as it has contracts that other unsafe code is free to - trust blindly. + is an unsafe operation, as it has contracts that other unsafe code is free to + trust blindly. * I am declaring that I have, to the best of my knowledge, adhered to the unchecked contracts: * On trait implementations, `unsafe` is declaring that the contract of the - `unsafe` trait has been upheld. + `unsafe` trait has been upheld. * On blocks, `unsafe` is declaring any unsafety from an unsafe - operation to be handled, and therefore the parent function is safe. + operation within to be handled, and therefore the parent function is safe. There is also `#[unsafe_no_drop_flag]`, which is a special case that exists for historical reasons and is in the process of being phased out. See the section on @@ -135,21 +141,21 @@ destructors for details. Some examples of unsafe functions: * `slice::get_unchecked` will perform unchecked indexing, allowing memory -safety to be freely violated. + safety to be freely violated. * `ptr::offset` in an intrinsic that invokes Undefined Behaviour if it is -not "in bounds" as defined by LLVM (see the lifetimes section for details). + not "in bounds" as defined by LLVM (see the lifetimes section for details). * `mem::transmute` reinterprets some value as having the given type, -bypassing type safety in arbitrary ways. (see the conversions section for details) + bypassing type safety in arbitrary ways. (see the conversions section for details) * All FFI functions are `unsafe` because they can do arbitrary things. -C being an obvious culprit, but generally any language can do something -that Rust isn't happy about. (see the FFI section for details) + C being an obvious culprit, but generally any language can do something + that Rust isn't happy about. (see the FFI section for details) As of Rust 1.0 there are exactly two unsafe traits: * `Send` is a marker trait (it has no actual API) that promises implementors -are safe to send to another thread. + are safe to send to another thread. * `Sync` is a marker trait that promises that threads can safely share -implementors through a shared reference. + implementors through a shared reference. All other traits that declare any kind of contract *really* can't be trusted to adhere to their contract when memory-safety is at stake. For instance Rust has @@ -167,3 +173,109 @@ thread safety is a sort of fundamental thing that a program can't really guard against locally (even by-value message passing still requires a notion Send). + + +# Working with unsafe + +Rust generally only gives us the tools to talk about safety in a scoped and +binary manner. Unfortunately reality is significantly more complicated than that. +For instance, consider the following toy function: + +```rust +fn do_idx(idx: usize, arr: &[u8]) -> Option { + if idx < arr.len() { + unsafe { + Some(*arr.get_unchecked(idx)) + } + } else { + None + } +} +``` + +Clearly, this function is safe. We check that the index is in bounds, and if it +is, index into the array in an unchecked manner. But even in such a trivial +function, the scope of the unsafe block is questionable. Consider changing the +`<` to a `<=`: + +```rust +fn do_idx(idx: usize, arr: &[u8]) -> Option { + if idx <= arr.len() { + unsafe { + Some(*arr.get_unchecked(idx)) + } + } else { + None + } +} +``` + +This program is now unsound, an yet *we only modified safe code*. This is the +fundamental problem of safety: it's non-local. The soundness of our unsafe +operations necessarily depends on the state established by "safe" operations. +Although safety *is* modular (we *still* don't need to worry about about +unrelated safety issues like uninitialized memory), it quickly contaminates the +surrounding code. + +Trickier than that is when we get into actual statefulness. Consider a simple +implementation of `Vec`: + +```rust +// Note this defintion is insufficient. See the section on lifetimes. +struct Vec { + ptr: *mut T, + len: usize, + cap: usize, +} + +// Note this implementation does not correctly handle zero-sized types. +// We currently live in a nice imaginary world of only postive fixed-size +// types. +impl Vec { + fn new() -> Self { + Vec { ptr: heap::EMPTY, len: 0, cap: 0 } + } + + fn push(&mut self, elem: T) { + if self.len == self.cap { + // not important for this example + self.reallocate(); + } + unsafe { + ptr::write(self.ptr.offset(len as isize), elem); + self.len += 1; + } + } + + fn pop(&mut self) -> Option { + if self.len > 0 { + self.len -= 1; + unsafe { + Some(ptr::read(self.ptr.offset(self.len as isize))) + } + } else { + None + } + } +} +``` + +This code is simple enough to reasonably audit and verify. Now consider +adding the following method: + +```rust + fn make_room(&mut self) { + // grow the capacity + self.cap += 1; + } +``` + +This code is safe, but it is also completely unsound. Changing the capacity +violates the invariants of Vec (that `cap` reflects the allocated space in the +Vec). This is not something the rest of `Vec` can guard against. It *has* to +trust the capacity field because there's no way to verify it. + +`unsafe` does more than pollute a whole function: it pollutes a whole *module*. +Generally, the only bullet-proof way to limit the scope of unsafe code is at the +module boundary with privacy. + diff --git a/raii.md b/raii.md index 7636303..679c1dd 100644 --- a/raii.md +++ b/raii.md @@ -13,7 +13,12 @@ point, really: Rust is about control. However we are not limited to just memory. Pretty much every other system resource like a thread, file, or socket is exposed through this kind of API. -So, how does RAII work in Rust? Unlike C++, Rust does not come with a slew on builtin + + + +# Constructors + +Unlike C++, Rust does not come with a slew of builtin kinds of constructor. There are no Copy, Default, Assignment, Move, or whatever constructors. This largely has to do with Rust's philosophy of being explicit. @@ -25,20 +30,26 @@ not happening in Rust (safely). Assignment and copy constructors similarly don't exist because move semantics are the *default* in rust. At most `x = y` just moves the bits of y into the x variable. Rust does provide two facilities for going back to C++'s copy-oriented semantics: `Copy` and `Clone`. Clone is our -moral equivalent of copy constructor, but it's never implicitly invoked. You have to explicitly +moral equivalent of a copy constructor, but it's never implicitly invoked. You have to explicitly call `clone` on an element you want to be cloned. Copy is a special case of Clone where the -implementation is just "duplicate the bitwise representation". Copy types *are* implicitely +implementation is just "copy the bits". Copy types *are* implicitly cloned whenever they're moved, but because of the definition of Copy this just means *not* -treating the old copy as uninitialized; a no-op. +treating the old copy as uninitialized -- a no-op. While Rust provides a `Default` trait for specifying the moral equivalent of a default constructor, it's incredibly rare for this trait to be used. This is because variables -aren't implicitely initialized (see [working with uninitialized memory][uninit] for details). +aren't implicitly initialized (see [working with uninitialized memory][uninit] for details). Default is basically only useful for generic programming. -More often than not, in a concrete case a type will provide a static `new` method for any -kind of "default" constructor. This has no relation to `new` in other languages and has no -special meaning. It's just a naming convention. +In concrete contexts, a type will provide a static `new` method for any +kind of "default" constructor. This has no relation to `new` in other +languages and has no special meaning. It's just a naming convention. + + + + + +# Destructors What the language *does* provide is full-blown automatic destructors through the `Drop` trait, which provides the following method: @@ -49,12 +60,19 @@ fn drop(&mut self); This method gives the type time to somehow finish what it was doing. **After `drop` is run, Rust will recursively try to drop all of the fields of the `self` struct**. This is a -convenience feature so that you don't have to write "destructor boilerplate" dropping -children. **There is no way to prevent this in Rust 1.0**. Also note that `&mut self` means -that even if you *could* supress recursive Drop, Rust will prevent you from e.g. moving fields -out of self. For most types, this is totally fine: they own all their data, there's no -additional state passed into drop to try to send it to, and `self` is about to be marked as -uninitialized (and therefore inaccessible). +convenience feature so that you don't have to write "destructor boilerplate" to drop +children. If a struct has no special logic for being dropped other than dropping its +children, then it means `Drop` doesn't need to be implemented at all! + +**There is no way to prevent this behaviour in Rust 1.0**. + +Note that taking `&mut self` means that even if you *could* suppress recursive Drop, +Rust will prevent you from e.g. moving fields out of self. For most types, this +is totally fine: + +* They own all their data (they don't contain pointers to elsewhere). +* There's no additional state passed into drop to try to send things. +* `self` is about to be marked as uninitialized (and therefore inaccessible). For instance, a custom implementation of `Box` might write `Drop` like this: @@ -73,7 +91,7 @@ impl Drop for Box { and this works fine because when Rust goes to drop the `ptr` field it just sees a *mut that has no actual `Drop` implementation. Similarly nothing can use-after-free the `ptr` because -the Box is completely gone. +the Box is immediately marked as uninitialized. However this wouldn't work: @@ -130,14 +148,14 @@ enum Link { } ``` -will have its inner Box field dropped *if and only if* a value stores the Next variant. +will have its inner Box field dropped *if and only if* an instance stores the Next variant. In general this works really nice because you don't need to worry about adding/removing -dtors when you refactor your data layout. Still there's certainly many valid usecases for +drops when you refactor your data layout. Still there's certainly many valid usecases for needing to do trickier things with destructors. -The classic safe solution to blocking recursive drop semantics and allowing moving out -of Self is to use an Option: +The classic safe solution to overriding recursive drop and allowing moving out +of Self during `drop` is to use an Option: ```rust struct Box{ ptr: *mut T } @@ -158,22 +176,255 @@ impl Drop for SuperBox { unsafe { // Hyper-optimized: deallocate the box's contents for it // without `drop`ing the contents. Need to set the `box` - // fields as `None` to prevent Rust from trying to Drop it. + // field as `None` to prevent Rust from trying to Drop it. heap::deallocate(self.box.take().unwrap().ptr); } } } ``` -However this has fairly odd semantics: you're saying that a field that *should* always be Some -may be None, just because that happens in the dtor. Of course this conversely makes a lot of sense: -you can call arbitrary methods on self during the destructor, and this should prevent you from -ever doing so after deinitializing the field. Not that it will prevent you from producing any other +However this has fairly odd semantics: you're saying that a field that *should* always +be Some may be None, just because that happens in the destructor. Of course this +conversely makes a lot of sense: you can call arbitrary methods on self during +the destructor, and this should prevent you from ever doing so after deinitializing +the field. Not that it will prevent you from producing any other arbitrarily invalid state in there. -On balance this is an ok choice. Certainly if you're just getting started. +On balance this is an ok choice. Certainly what you should reach for by default. +However, in the future we expect there to be a first-class way to announce that +a field shouldn't be automatically dropped. + + + + +# Leaking + +Ownership based resource management is intended to simplify composition. You +acquire resources when you create the object, and you release the resources +when it gets destroyed. Since destruction is handled for you, it means you +can't forget to release the resources, and it happens as soon as possible! +Surely this is perfect and all of our problems are solved. + +Everything is terrible and we have new and exotic problems to try to solve. + +Many people like to believe that Rust eliminates resource leaks, but this +is absolutely not the case, no matter how you look at it. In the strictest +sense, "leaking" is so abstract as to be unpreventable. It's quite trivial +to initialize a collection at the start of a program, fill it with tons of +objects with destructors, and then enter an infinite event loop that never +refers to it. The collection will sit around uselessly, holding on to its +precious resources until the program terminates (at which point all those +resources would have been reclaimed by the OS anyway). + +We may consider a more restricted form of leak: failing to free memory that +is unreachable. Rust also doesn't prevent this. In fact Rust has a *function +for doing this*: `mem::forget`. This function consumes the value it is passed +*and then doesn't run its destructor*. + +In the past `mem::forget` was marked as unsafe as a sort of lint against using +it, since failing to call a destructor is generally not a well-behaved thing to +do (though useful for some special unsafe code). However this was generally +determined to be an untenable stance to take: there are *many* ways to fail to +call a destructor in safe code. The most famous example is creating a cycle +of reference counted pointers using interior mutability. + +It is reasonable for safe code to assume that destructor leaks do not happen, +as any program that leaks destructors is probably wrong. However *unsafe* code +cannot rely on destructors to be run to be *safe*. For most types this doesn't +matter: if you leak the destructor then the type is *by definition* inaccessible, +so it doesn't matter, right? e.g. if you leak a `Box` then you waste some +memory but that's hardly going to violate memory-safety. + +However where we must be careful with destructor leaks are *proxy* types. +These are types which manage access to a distinct object, but don't actually +own it. Proxy objects are quite rare. Proxy objects you'll need to care about +are even rarer. However we'll focus on two interesting examples in the +standard library: + +* `vec::Drain` +* `Rc` + + + + +## Drain + +`drain` is a collections API that moves data out of the container without +consuming the container. This enables us to reuse the allocation of a `Vec` +after claiming ownership over all of its contents. drain produces an iterator +(Drain) that returns the contents of the Vec by-value. + +Now, consider Drain in the middle of iteration: some values have been moved out, +and others haven't. This means that part of the Vec is now full of logically +uninitialized data! We could backshift all the elements in the Vec every time we +remove a value, but this would have pretty catastrophic performance consequences. + +Instead, we would like Drain to *fix* the Vec's backing storage when it is +dropped. It should run itself to completion, backshift any elements that weren't +removed (drain supports subranges), and then fix Vec's `len`. It's even +unwinding-safe! Easy! + +Now consider the following: + +``` +let mut vec = vec![Box::new(0); 4]; + +{ + // start draining, vec can no longer be accessed + let mut drainer = vec.drain(..); + + // pull out two elements and immediately drop them + drainer.next(); + drainer.next(); + + // get rid of drainer, but don't call its destructor + mem::forget(drainer); +} + +// Oops, vec[0] was dropped, we're reading a pointer into free'd memory! +println!("{}", vec[0]); +``` + +This is pretty clearly Not Good. Unfortunately, we're kind've stuck between +a rock and a hard place: maintaining consistent state at every step has +an enormous cost (and would negate any benefits of the API). Failing to maintain +consistent state gives us Undefined Behaviour in safe code (making the API +unsound). + +So what can we do? Well, we can pick a trivially consistent state: set the Vec's +len to be 0 when we *start* the iteration, and fix it up if necessary in the +destructor. That way, if everything executes like normal we get the desired +behaviour with minimal overhead. But if someone has the *audacity* to mem::forget +us in the middle of the iteration, all that does is *leak even more* (and possibly +leave the Vec in an *unexpected* but consistent state). Since we've +accepted that mem::forget is safe, this is definitely safe. We call leaks causing +more leaks a *leak amplification*. + + + + +## Rc + +Rc is an interesting case because at first glance it doesn't appear to be a +proxy value at all. After all, it manages the data it points to, and dropping +all the Rcs for a value will drop that value. leaking an Rc doesn't seem like +it would be particularly dangerous. It will leave the refcount permanently +incremented and prevent the data from being freed or dropped, but that seems +just like Box, right? + +Nope. + +Let's consider a simplified implementation of Rc: + +```rust +struct Rc { + ptr: *mut RcBox, +} + +struct RcBox { + data: T, + ref_count: usize, +} + +impl Rc { + fn new(data: T) -> Self { + unsafe { + // Wouldn't it be nice if heap::allocate worked like this? + let ptr = heap::allocate>(); + ptr::write(ptr, RcBox { + data: data, + ref_count: 1, + }); + Rc { ptr: ptr } + } + } + + fn clone(&self) -> Self { + unsafe { + (*self.ptr).ref_count += 1; + } + Rc { ptr: self.ptr } + } +} + +impl Drop for Rc { + fn drop(&mut self) { + unsafe { + let inner = &mut ; + (*self.ptr).ref_count -= 1; + if (*self.ptr).ref_count == 0 { + // drop the data and then free it + ptr::read(self.ptr); + heap::deallocate(self.ptr); + } + } + } +} +``` + +This code contains an implicit and subtle assumption: ref_count can fit in a +`usize`, because there can't be more than `usize::MAX` Rcs in memory. However +this itself assumes that the ref_count accurately reflects the number of Rcs +in memory, which we know is false with mem::forget. Using mem::forget we can +overflow the ref_count, and then get it down to 0 with outstanding Rcs. Then we +can happily use-after-free the inner data. Bad Bad Not Good. + +This can be solved by *saturating* the ref_count, which is sound because +decreasing the refcount by `n` still requires `n` Rcs simultaneously living +in memory. + + + + +## thread::scoped + +The thread::scoped API intends to allow threads to be spawned that reference +data on the stack without any synchronization over that data. Usage looked like: + +```rust +let mut data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]; +{ + let guards = vec![]; + for x in &mut data { + // Move the mutable reference into the closure, and execute + // it on a different thread. The closure has a lifetime bound + // by the lifetime of the mutable reference `x` we store in it. + // The guard that is returned is in turn assigned the lifetime + // of the closure, so it also mutably borrows `data` as `x` did. + // This means we cannot access `data` until the guard goes away. + let guard = thread::scoped(move || { + *x *= 2; + }); + // store the thread's guard for later + guards.push(guard); + } + // All guards are dropped here, forcing the threads to join + // (this thread blocks here until the others terminate). + // Once the threads join, the borrow expires and the data becomes + // accessible again in this thread. +} +// data is definitely mutated here. +``` + +In principle, this totally works! Rust's ownership system perfectly ensures it! +...except it relies on a destructor being called to be safe. + +``` +let mut data = Box::new(0); +{ + let guard = thread::scoped(|| { + // This is at best a data race. At worst, it's *also* a use-after-free. + *data += 1; + }); + // Because the guard is forgotten, expiring the loan without blocking this + // thread. + mem::forget(guard); +} +// So the Box is dropped here while the scoped thread may or may not be trying +// to access it. +``` -In the future, we expect there to be a first-class way to announce that a field -should be automatically dropped. +Dang. Here the destructor running was pretty fundamental to the API, and it had +to be scrapped in favour of a completely different design. [uninit]: uninitialized.html \ No newline at end of file