mirror of https://github.com/rust-lang/nomicon
parent
c5d773d66d
commit
c468eafbd4
@ -0,0 +1,13 @@
|
||||
% Advanced Lifetimes
|
||||
|
||||
Lifetimes are the breakout feature of Rust.
|
||||
|
||||
# Safe Rust
|
||||
|
||||
* no aliasing of &mut
|
||||
|
||||
# Unsafe Rust
|
||||
|
||||
* Splitting lifetimes into disjoint regions
|
||||
* Creating lifetimes from raw pointers
|
||||
*
|
@ -0,0 +1,179 @@
|
||||
% The Perils Of RAII
|
||||
|
||||
Ownership Based Resource Management (AKA RAII: Resource Acquisition is Initialization) is
|
||||
something you'll interact with a lot in Rust. Especially if you use the standard library.
|
||||
|
||||
Roughly speaking the pattern is as follows: to acquire a resource, you create an object that
|
||||
manages it. To release the resource, you simply destroy the object, and it cleans up the
|
||||
resource for you. The most common "resource"
|
||||
this pattern manages is simply *memory*. `Box`, `Rc`, and basically everything in
|
||||
`std::collections` is a convenience to enable correctly managing memory. This is particularly
|
||||
important in Rust because we have no pervasive GC to rely on for memory management. Which is the
|
||||
point, really: Rust is about control. However we are not limited to just memory.
|
||||
Pretty much every other system resource like a thread, file, or socket is exposed through
|
||||
this kind of API.
|
||||
|
||||
So, how does RAII work in Rust? Unlike C++, Rust does not come with a slew on builtin
|
||||
kinds of constructor. There are no Copy, Default, Assignment, Move, or whatever constructors.
|
||||
This largely has to do with Rust's philosophy of being explicit.
|
||||
|
||||
Move constructors are meaningless in Rust because we don't enable types to "care" about their
|
||||
location in memory. Every type must be ready for it to be blindly memcopied to somewhere else
|
||||
in memory. This means pure on-the-stack-but-still-movable intrusive linked lists are simply
|
||||
not happening in Rust (safely).
|
||||
|
||||
Assignment and copy constructors similarly don't exist because move semantics are the *default*
|
||||
in rust. At most `x = y` just moves the bits of y into the x variable. Rust does provide two
|
||||
facilities for going back to C++'s copy-oriented semantics: `Copy` and `Clone`. Clone is our
|
||||
moral equivalent of copy constructor, but it's never implicitly invoked. You have to explicitly
|
||||
call `clone` on an element you want to be cloned. Copy is a special case of Clone where the
|
||||
implementation is just "duplicate the bitwise representation". Copy types *are* implicitely
|
||||
cloned whenever they're moved, but because of the definition of Copy this just means *not*
|
||||
treating the old copy as uninitialized; a no-op.
|
||||
|
||||
While Rust provides a `Default` trait for specifying the moral equivalent of a default
|
||||
constructor, it's incredibly rare for this trait to be used. This is because variables
|
||||
aren't implicitely initialized (see [working with uninitialized memory][uninit] for details).
|
||||
Default is basically only useful for generic programming.
|
||||
|
||||
More often than not, in a concrete case a type will provide a static `new` method for any
|
||||
kind of "default" constructor. This has no relation to `new` in other languages and has no
|
||||
special meaning. It's just a naming convention.
|
||||
|
||||
What the language *does* provide is full-blown automatic destructors through the `Drop` trait,
|
||||
which provides the following method:
|
||||
|
||||
```rust
|
||||
fn drop(&mut self);
|
||||
```
|
||||
|
||||
This method gives the type time to somehow finish what it was doing. **After `drop` is run,
|
||||
Rust will recursively try to drop all of the fields of the `self` struct**. This is a
|
||||
convenience feature so that you don't have to write "destructor boilerplate" dropping
|
||||
children. **There is no way to prevent this in Rust 1.0**. Also note that `&mut self` means
|
||||
that even if you *could* supress recursive Drop, Rust will prevent you from e.g. moving fields
|
||||
out of self. For most types, this is totally fine: they own all their data, there's no
|
||||
additional state passed into drop to try to send it to, and `self` is about to be marked as
|
||||
uninitialized (and therefore inaccessible).
|
||||
|
||||
For instance, a custom implementation of `Box` might write `Drop` like this:
|
||||
|
||||
```rust
|
||||
struct Box<T>{ ptr: *mut T }
|
||||
|
||||
impl<T> Drop for Box<T> {
|
||||
fn drop(&mut self) {
|
||||
unsafe {
|
||||
(*self.ptr).drop();
|
||||
heap::deallocate(self.ptr);
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
and this works fine because when Rust goes to drop the `ptr` field it just sees a *mut that
|
||||
has no actual `Drop` implementation. Similarly nothing can use-after-free the `ptr` because
|
||||
the Box is completely gone.
|
||||
|
||||
However this wouldn't work:
|
||||
|
||||
```rust
|
||||
struct Box<T>{ ptr: *mut T }
|
||||
|
||||
impl<T> Drop for Box<T> {
|
||||
fn drop(&mut self) {
|
||||
unsafe {
|
||||
(*self.ptr).drop();
|
||||
heap::deallocate(self.ptr);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
struct SuperBox<T> { box: Box<T> }
|
||||
|
||||
impl<T> Drop for SuperBox<T> {
|
||||
fn drop(&mut self) {
|
||||
unsafe {
|
||||
// Hyper-optimized: deallocate the box's contents for it
|
||||
// without `drop`ing the contents
|
||||
heap::deallocate(self.box.ptr);
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
because after we deallocate the `box`'s ptr in SuperBox's destructor, Rust will
|
||||
happily proceed to tell the box to Drop itself and everything will blow up with
|
||||
use-after-frees and double-frees.
|
||||
|
||||
Note that the recursive drop behaviour applies to *all* structs and enums
|
||||
regardless of whether they implement Drop. Therefore something like
|
||||
|
||||
```rust
|
||||
struct Boxy<T> {
|
||||
data1: Box<T>,
|
||||
data2: Box<T>,
|
||||
info: u32,
|
||||
}
|
||||
```
|
||||
|
||||
will have its data1 and data2's fields destructors whenever it "would" be
|
||||
dropped, even though it itself doesn't implement Drop. We say that such a type
|
||||
*needs Drop*, even though it is not itself Drop.
|
||||
|
||||
Similarly,
|
||||
|
||||
```rust
|
||||
enum Link {
|
||||
Next(Box<Link>),
|
||||
None,
|
||||
}
|
||||
```
|
||||
|
||||
will have its inner Box field dropped *if and only if* a value stores the Next variant.
|
||||
|
||||
In general this works really nice because you don't need to worry about adding/removing
|
||||
dtors when you refactor your data layout. Still there's certainly many valid usecases for
|
||||
needing to do trickier things with destructors.
|
||||
|
||||
The classic safe solution to blocking recursive drop semantics and allowing moving out
|
||||
of Self is to use an Option:
|
||||
|
||||
```rust
|
||||
struct Box<T>{ ptr: *mut T }
|
||||
|
||||
impl<T> Drop for Box<T> {
|
||||
fn drop(&mut self) {
|
||||
unsafe {
|
||||
(*self.ptr).drop();
|
||||
heap::deallocate(self.ptr);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
struct SuperBox<T> { box: Option<Box<T>> }
|
||||
|
||||
impl<T> Drop for SuperBox<T> {
|
||||
fn drop(&mut self) {
|
||||
unsafe {
|
||||
// Hyper-optimized: deallocate the box's contents for it
|
||||
// without `drop`ing the contents. Need to set the `box`
|
||||
// fields as `None` to prevent Rust from trying to Drop it.
|
||||
heap::deallocate(self.box.take().unwrap().ptr);
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
However this has fairly odd semantics: you're saying that a field that *should* always be Some
|
||||
may be None, just because that happens in the dtor. Of course this conversely makes a lot of sense:
|
||||
you can call arbitrary methods on self during the destructor, and this should prevent you from
|
||||
ever doing so after deinitializing the field. Not that it will prevent you from producing any other
|
||||
arbitrarily invalid state in there.
|
||||
|
||||
On balance this is an ok choice. Certainly if you're just getting started.
|
||||
|
||||
In the future, we expect there to be a first-class way to announce that a field
|
||||
should be automatically dropped.
|
||||
|
||||
[uninit]:
|
@ -0,0 +1,179 @@
|
||||
% Working With Uninitialized Memory
|
||||
|
||||
All runtime-allocated memory in a Rust program begins its life as *uninitialized*. In this state the value of the memory is an indeterminate pile of bits that may or may not even reflect a valid state for the type that is supposed to inhabit that location of memory. Attempting to interpret this memory as a value of *any* type will cause Undefined Behaviour. Do Not Do This.
|
||||
|
||||
Like C, all stack variables in Rust begin their life as uninitialized until a value is explicitly assigned to them. Unlike C, Rust statically prevents you from ever reading them until you do:
|
||||
|
||||
```rust
|
||||
fn main() {
|
||||
let x: i32;
|
||||
println!("{}", x);
|
||||
}
|
||||
```
|
||||
|
||||
```text
|
||||
src/main.rs:3:20: 3:21 error: use of possibly uninitialized variable: `x`
|
||||
src/main.rs:3 println!("{}", x);
|
||||
^
|
||||
```
|
||||
|
||||
This is based off of a basic branch analysis: every branch must assign a value to `x` before it
|
||||
is first used. Interestingly, Rust doesn't require the variable to be mutable to perform a delayed initialization if every branch assigns exactly once. However the analysis does not take advantage of constant analysis or anything like that. So this compiles:
|
||||
|
||||
```rust
|
||||
fn main() {
|
||||
let x: i32;
|
||||
let y: i32;
|
||||
|
||||
y = 1;
|
||||
|
||||
if true {
|
||||
x = 1;
|
||||
} else {
|
||||
x = 2;
|
||||
}
|
||||
|
||||
println!("{} {}", x, y);
|
||||
}
|
||||
```
|
||||
|
||||
but this doesn't:
|
||||
|
||||
```rust
|
||||
fn main() {
|
||||
let x: i32;
|
||||
if true {
|
||||
x = 1;
|
||||
}
|
||||
println!("{}", x);
|
||||
}
|
||||
```
|
||||
|
||||
```text
|
||||
src/main.rs:6:17: 6:18 error: use of possibly uninitialized variable: `x`
|
||||
src/main.rs:6 println!("{}", x);
|
||||
```
|
||||
|
||||
while this does:
|
||||
|
||||
```rust
|
||||
fn main() {
|
||||
let x: i32;
|
||||
if true {
|
||||
x = 1;
|
||||
println!("{}", x);
|
||||
}
|
||||
// Don't care that there are branches where it's not initialized
|
||||
// since we don't use the value in those branches
|
||||
}
|
||||
```
|
||||
|
||||
If a value is moved out of a variable, that variable becomes logically uninitialized if the type
|
||||
of the value isn't Copy. That is:
|
||||
|
||||
```rust
|
||||
fn main() {
|
||||
let x = 0;
|
||||
let y = Box::new(0);
|
||||
let z1 = x; // x is still valid because i32 is Copy
|
||||
let z2 = y; // y has once more become logically uninitialized, since Box is not Copy
|
||||
}
|
||||
```
|
||||
|
||||
However reassigning `y` in this example *would* require `y` to be marked as mutable, as a
|
||||
Safe Rust program could observe that the value of `y` changed. Otherwise the variable is
|
||||
exactly like new.
|
||||
|
||||
This raises an interesting question with respect to `Drop`: where does Rust
|
||||
try to call the destructor of a variable that is conditionally initialized?
|
||||
It turns out that Rust actually tracks whether a type should be dropped or not *at runtime*. As a
|
||||
variable becomes initialized and uninitialized, a *drop flag* for that variable is set and unset.
|
||||
When a variable goes out of scope or is assigned it evaluates whether the current value of the
|
||||
variable should be dropped. Of course, static analysis can remove these checks. If the compiler
|
||||
can prove that a value is guaranteed to be either initialized or not, then it can theoretically
|
||||
generate more efficient code! As such it may be desirable to structure code to have *static drop
|
||||
semantics* when possible.
|
||||
|
||||
As of Rust 1.0, the drop flags are actually not-so-secretly stashed in a secret field of any type
|
||||
that implements Drop. The language sets the drop flag by overwriting the entire struct with a
|
||||
particular value. This is pretty obviously Not The Fastest and causes a bunch of trouble with
|
||||
optimizing code. As such work is currently under way to move the flags out onto the stack frame
|
||||
where they more reasonably belong. Unfortunately this work will take some time as it requires
|
||||
fairly substantial changes to the compiler.
|
||||
|
||||
So in general, Rust programs don't need to worry about uninitialized values on the stack for
|
||||
correctness. Although they might care for performance. Thankfully, Rust makes it easy to take
|
||||
control here! Uninitialized values are there, and Safe Rust lets you work with them, but you're
|
||||
never in trouble.
|
||||
|
||||
One interesting exception to this rule is working with arrays. Safe Rust doesn't permit you to
|
||||
partially initialize an array. When you initialize an array, you can either set every value to the
|
||||
same thing with `let x = [val; N]`, or you can specify each member individually with
|
||||
`let x = [val1, val2, val3]`. Unfortunately this is pretty rigid, especially if you need
|
||||
to initialize your array in a more incremental or dynamic way.
|
||||
|
||||
Unsafe Rust gives us a powerful tool to handle this problem: `std::mem::uninitialized`.
|
||||
This function pretends to return a value when really it does nothing at all. Using it, we can
|
||||
convince Rust that we have initialized a variable, allowing us to do trickier things with
|
||||
conditional and incremental initialization.
|
||||
|
||||
Unfortunately, this raises a tricky problem. Assignment has a different meaning to Rust based on
|
||||
whether it believes that a variable is initialized or not. If it's uninitialized, then Rust will
|
||||
semantically just memcopy the bits over the uninit ones, and do nothing else. However if Rust
|
||||
believes a value to be initialized, it will try to `Drop` the old value! Since we've tricked Rust
|
||||
into believing that the value is initialized, we can no longer safely use normal assignment.
|
||||
|
||||
This is also a problem if you're working with a raw system allocator, which of course returns a
|
||||
pointer to uninitialized memory.
|
||||
|
||||
To handle this, we must use the `std::ptr` module. In particular, it provides three functions that
|
||||
allow us to assign bytes to a location in memory without evaluating the old value: `write`, `copy`, and `copy_nonoverlapping`.
|
||||
|
||||
* `ptr::write(ptr, val)` takes a `val` and moves it into the address pointed to by `ptr`.
|
||||
* `ptr::copy(src, dest, count)` copies the bits that `count` T's would occupy from src to dest. (this is equivalent to memmove -- note that the argument order is reversed!)
|
||||
* `ptr::copy_nonoverlapping(src, dest, count)` does what `copy` does, but a little faster on the
|
||||
assumption that the two ranges of memory don't overlap. (this is equivalent to memcopy -- note that the argument order is reversed!)
|
||||
|
||||
It should go without saying that these functions, if misused, will cause serious havoc or just
|
||||
straight up Undefined Behaviour. The only things that these functions *themselves* require is that
|
||||
the locations you want to read and write are allocated. However the ways writing arbitrary bit
|
||||
patterns to arbitrary locations of memory can break things are basically uncountable!
|
||||
|
||||
Putting this all together, we get the following:
|
||||
|
||||
```rust
|
||||
fn main() {
|
||||
use std::mem;
|
||||
|
||||
// size of the array is hard-coded but easy to change. This means we can't
|
||||
// use [a, b, c] syntax to initialize the array, though!
|
||||
const SIZE = 10;
|
||||
|
||||
let x: [Box<u32>; SIZE];
|
||||
|
||||
unsafe {
|
||||
// convince Rust that x is Totally Initialized
|
||||
x = mem::uninitialized();
|
||||
for i in 0..SIZE {
|
||||
// very carefully overwrite each index without reading it
|
||||
ptr::write(&mut x[i], Box::new(i));
|
||||
}
|
||||
}
|
||||
|
||||
println!("{}", x);
|
||||
}
|
||||
```
|
||||
|
||||
It's worth noting that you don't need to worry about ptr::write-style shenanigans with
|
||||
Plain Old Data (POD; types which don't implement Drop, nor contain Drop types),
|
||||
because Rust knows not to try to Drop them. Similarly you should be able to assign the POD
|
||||
fields of partially initialized structs directly.
|
||||
|
||||
However when working with uninitialized memory you need to be ever vigilant for Rust trying to
|
||||
Drop values you make like this before they're fully initialized. So every control path through
|
||||
that variable's scope must initialize the value before it ends. *This includes code panicking*.
|
||||
Again, POD types need not worry.
|
||||
|
||||
And that's about it for working with uninitialized memory! Basically nothing anywhere expects
|
||||
to be handed uninitialized memory, so if you're going to pass it around at all, be sure to be
|
||||
*really* careful.
|
Loading…
Reference in new issue