add md files

10 years ago · c468eafbd4
parent c5d773d66d
commit c468eafbd4
3 changed files with 371 additions and 0 deletions
--- a/lifetimes.md
+++ b/lifetimes.md
@ -0,0 +1,13 @@
 % Advanced Lifetimes
 Lifetimes are the breakout feature of Rust.
 # Safe Rust
 * no aliasing of &mut
 # Unsafe Rust
 * Splitting lifetimes into disjoint regions
 * Creating lifetimes from raw pointers
 *
--- a/raii.md
+++ b/raii.md
@ -0,0 +1,179 @@
 % The Perils Of RAII
 Ownership Based Resource Management (AKA RAII: Resource Acquisition is Initialization) is
 something you'll interact with a lot in Rust. Especially if you use the standard library.
 Roughly speaking the pattern is as follows: to acquire a resource, you create an object that
 manages it. To release the resource, you simply destroy the object, and it cleans up the
 resource for you. The most common "resource"
 this pattern manages is simply *memory*. `Box`, `Rc`, and basically everything in
 `std::collections` is a convenience to enable correctly managing memory. This is particularly
 important in Rust because we have no pervasive GC to rely on for memory management. Which is the
 point, really: Rust is about control. However we are not limited to just memory.
 Pretty much every other system resource like a thread, file, or socket is exposed through
 this kind of API.
 So, how does RAII work in Rust? Unlike C++, Rust does not come with a slew on builtin
 kinds of constructor. There are no Copy, Default, Assignment, Move, or whatever constructors.
 This largely has to do with Rust's philosophy of being explicit.
 Move constructors are meaningless in Rust because we don't enable types to "care" about their
 location in memory. Every type must be ready for it to be blindly memcopied to somewhere else
 in memory. This means pure on-the-stack-but-still-movable intrusive linked lists are simply
 not happening in Rust (safely).
 Assignment and copy constructors similarly don't exist because move semantics are the *default*
 in rust. At most `x = y` just moves the bits of y into the x variable. Rust does provide two
 facilities for going back to C++'s copy-oriented semantics: `Copy` and `Clone`. Clone is our
 moral equivalent of copy constructor, but it's never implicitly invoked. You have to explicitly
 call `clone` on an element you want to be cloned. Copy is a special case of Clone where the
 implementation is just "duplicate the bitwise representation". Copy types *are* implicitely
 cloned whenever they're moved, but because of the definition of Copy this just means *not*
 treating the old copy as uninitialized; a no-op.
 While Rust provides a `Default` trait for specifying the moral equivalent of a default
 constructor, it's incredibly rare for this trait to be used. This is because variables
 aren't implicitely initialized (see [working with uninitialized memory][uninit] for details).
 Default is basically only useful for generic programming.
 More often than not, in a concrete case a type will provide a static `new` method for any
 kind of "default" constructor. This has no relation to `new` in other languages and has no
 special meaning. It's just a naming convention.
 What the language *does* provide is full-blown automatic destructors through the `Drop` trait,
 which provides the following method:
 ```rust
 fn drop(&mut self);
 ```
 This method gives the type time to somehow finish what it was doing. **After `drop` is run,
 Rust will recursively try to drop all of the fields of the `self` struct**. This is a
 convenience feature so that you don't have to write "destructor boilerplate" dropping
 children. **There is no way to prevent this in Rust 1.0**.  Also note that `&mut self` means
 that even if you *could* supress recursive Drop, Rust will prevent you from e.g. moving fields
 out of self. For most types, this is totally fine: they own all their data, there's no
 additional state passed into drop to try to send it to, and `self` is about to be marked as
 uninitialized (and therefore inaccessible).
 For instance, a custom implementation of `Box` might write `Drop` like this:
 ```rust
 struct Box<T>{ ptr: *mut T }
 impl<T> Drop for Box<T> {
 	fn drop(&mut self) {
 		unsafe {
 			(*self.ptr).drop();
 			heap::deallocate(self.ptr);
 		}
 	}
 }
 ```
 and this works fine because when Rust goes to drop the `ptr` field it just sees a *mut that
 has no actual `Drop` implementation. Similarly nothing can use-after-free the `ptr` because
 the Box is completely gone.
 However this wouldn't work:
 ```rust
 struct Box<T>{ ptr: *mut T }
 impl<T> Drop for Box<T> {
 	fn drop(&mut self) {
 		unsafe {
 			(*self.ptr).drop();
 			heap::deallocate(self.ptr);
 		}
 	}
 }
 struct SuperBox<T> { box: Box<T> }
 impl<T> Drop for SuperBox<T> {
 	fn drop(&mut self) {
 		unsafe {
 			// Hyper-optimized: deallocate the box's contents for it
 			// without `drop`ing the contents
 			heap::deallocate(self.box.ptr);
 		}
 	}
 }
 ```
 because after we deallocate the `box`'s ptr in SuperBox's destructor, Rust will
 happily proceed to tell the box to Drop itself and everything will blow up with
 use-after-frees and double-frees.
 Note that the recursive drop behaviour applies to *all* structs and enums
 regardless of whether they implement Drop. Therefore something like
 ```rust
 struct Boxy<T> {
 	data1: Box<T>,
 	data2: Box<T>,
 	info: u32,
 }
 ```
 will have its data1 and data2's fields destructors whenever it "would" be
 dropped, even though it itself doesn't implement Drop. We say that such a type
 *needs Drop*, even though it is not itself Drop.
 Similarly,
 ```rust
 enum Link {
 	Next(Box<Link>),
 	None,
 }
 ```
 will have its inner Box field dropped *if and only if* a value stores the Next variant.
 In general this works really nice because you don't need to worry about adding/removing
 dtors when you refactor your data layout. Still there's certainly many valid usecases for
 needing to do trickier things with destructors.
 The classic safe solution to blocking recursive drop semantics and allowing moving out
 of Self is to use an Option:
 ```rust
 struct Box<T>{ ptr: *mut T }
 impl<T> Drop for Box<T> {
 	fn drop(&mut self) {
 		unsafe {
 			(*self.ptr).drop();
 			heap::deallocate(self.ptr);
 		}
 	}
 }
 struct SuperBox<T> { box: Option<Box<T>> }
 impl<T> Drop for SuperBox<T> {
 	fn drop(&mut self) {
 		unsafe {
 			// Hyper-optimized: deallocate the box's contents for it
 			// without `drop`ing the contents. Need to set the `box`
 			// fields as `None` to prevent Rust from trying to Drop it.
 			heap::deallocate(self.box.take().unwrap().ptr);
 		}
 	}
 }
 ```
 However this has fairly odd semantics: you're saying that a field that *should* always be Some
 may be None, just because that happens in the dtor. Of course this conversely makes a lot of sense:
 you can call arbitrary methods on self during the destructor, and this should prevent you from
 ever doing so after deinitializing the field. Not that it will prevent you from producing any other
 arbitrarily invalid state in there.
 On balance this is an ok choice. Certainly if you're just getting started.
 In the future, we expect there to be a first-class way to announce that a field
 should be automatically dropped.
 [uninit]:
--- a/uninitialized.md
+++ b/uninitialized.md
@ -0,0 +1,179 @@
 % Working With Uninitialized Memory
 All runtime-allocated memory in a Rust program begins its life as *uninitialized*. In this state the value of the memory is an indeterminate pile of bits that may or may not even reflect a valid state for the type that is supposed to inhabit that location of memory. Attempting to interpret this memory as a value of *any* type will cause Undefined Behaviour. Do Not Do This.
 Like C, all stack variables in Rust begin their life as uninitialized until a value is explicitly assigned to them. Unlike C, Rust statically prevents you from ever reading them until you do:
 ```rust
 fn main() {
 	let x: i32;
 	println!("{}", x);
 }
 ```
 ```text
 src/main.rs:3:20: 3:21 error: use of possibly uninitialized variable: `x`
 src/main.rs:3     println!("{}", x);
                                 ^
 ```
 This is based off of a basic branch analysis: every branch must assign a value to `x` before it
 is first used. Interestingly, Rust doesn't require the variable to be mutable to perform a delayed initialization if every branch assigns exactly once. However the analysis does not take advantage of constant analysis or anything like that. So this compiles:
 ```rust
 fn main() {
 	let x: i32;
 	let y: i32;
 	y = 1;
 	if true {
 		x = 1;
 	} else {
 		x = 2;
 	}
    println!("{} {}", x, y);
 }
 ```
 but this doesn't:
 ```rust
 fn main() {
 	let x: i32;
 	if true {
 		x = 1;
 	}
 	println!("{}", x);
 }
 ```
 ```text
 src/main.rs:6:17: 6:18 error: use of possibly uninitialized variable: `x`
 src/main.rs:6 	println!("{}", x);
 ```
 while this does:
 ```rust
 fn main() {
 	let x: i32;
 	if true {
 		x = 1;
 		println!("{}", x);
 	}
 	// Don't care that there are branches where it's not initialized
 	// since we don't use the value in those branches
 }
 ```
 If a value is moved out of a variable, that variable becomes logically uninitialized if the type
 of the value isn't Copy. That is:
 ```rust
 fn main() {
 	let x = 0;
 	let y = Box::new(0);
 	let z1 = x; // x is still valid because i32 is Copy
 	let z2 = y; // y has once more become logically uninitialized, since Box is not Copy
 }
 ```
 However reassigning `y` in this example *would* require `y` to be marked as mutable, as a
 Safe Rust program could observe that the value of `y` changed. Otherwise the variable is
 exactly like new.
 This raises an interesting question with respect to `Drop`: where does Rust
 try to call the destructor of a variable that is conditionally initialized?
 It turns out that Rust actually tracks whether a type should be dropped or not *at runtime*. As a
 variable becomes initialized and uninitialized, a *drop flag* for that variable is set and unset.
 When a variable goes out of scope or is assigned it evaluates whether the current value of the
 variable should be dropped. Of course, static analysis can remove these checks. If the compiler
 can prove that a value is guaranteed to be either initialized or not, then it can theoretically
 generate more efficient code! As such it may be desirable to structure code to have *static drop
 semantics* when possible.
 As of Rust 1.0, the drop flags are actually not-so-secretly stashed in a secret field of any type
 that implements Drop. The language sets the drop flag by overwriting the entire struct with a
 particular value. This is pretty obviously Not The Fastest and causes a bunch of trouble with
 optimizing code. As such work is currently under way to move the flags out onto the stack frame
 where they more reasonably belong. Unfortunately this work will take some time as it requires
 fairly substantial changes to the compiler.
 So in general, Rust programs don't need to worry about uninitialized values on the stack for
 correctness. Although they might care for performance. Thankfully, Rust makes it easy to take
 control here! Uninitialized values are there, and Safe Rust lets you work with them, but you're
 never in trouble.
 One interesting exception to this rule is working with arrays. Safe Rust doesn't permit you to
 partially initialize an array. When you initialize an array, you can either set every value to the
 same thing with `let x = [val; N]`, or you can specify each member individually with
 `let x = [val1, val2, val3]`. Unfortunately this is pretty rigid, especially if you need
 to initialize your array in a more incremental or dynamic way.
 Unsafe Rust gives us a powerful tool to handle this problem: `std::mem::uninitialized`.
 This function pretends to return a value when really it does nothing at all. Using it, we can
 convince Rust that we have initialized a variable, allowing us to do trickier things with
 conditional and incremental initialization.
 Unfortunately, this raises a tricky problem. Assignment has a different meaning to Rust based on
 whether it believes that a variable is initialized or not. If it's uninitialized, then Rust will
 semantically just memcopy the bits over the uninit ones, and do nothing else. However if Rust
 believes a value to be initialized, it will try to `Drop` the old value! Since we've tricked Rust
 into believing that the value is initialized, we can no longer safely use normal assignment.
 This is also a problem if you're working with a raw system allocator, which of course returns a
 pointer to uninitialized memory.
 To handle this, we must use the `std::ptr` module. In particular, it provides three functions that
 allow us to assign bytes to a location in memory without evaluating the old value: `write`, `copy`, and `copy_nonoverlapping`.
 * `ptr::write(ptr, val)` takes a `val` and moves it into the address pointed to by `ptr`.
 * `ptr::copy(src, dest, count)` copies the bits that `count` T's would occupy from src to dest. (this is equivalent to memmove -- note that the argument order is reversed!)
 * `ptr::copy_nonoverlapping(src, dest, count)` does what `copy` does, but a little faster on the
 assumption that the two ranges of memory don't overlap. (this is equivalent to memcopy -- note that the argument order is reversed!)
 It should go without saying that these functions, if misused, will cause serious havoc or just
 straight up Undefined Behaviour. The only things that these functions *themselves* require is that
 the locations you want to read and write are allocated. However the ways writing arbitrary bit
 patterns to arbitrary locations of memory can break things are basically uncountable!
 Putting this all together, we get the following:
 ```rust
 fn main() {
 	use std::mem;
 	// size of the array is hard-coded but easy to change. This means we can't
 	// use [a, b, c] syntax to initialize the array, though!
 	const SIZE = 10;
 	let x: [Box<u32>; SIZE];
 	unsafe {
 		// convince Rust that x is Totally Initialized
 		x = mem::uninitialized();
 		for i in 0..SIZE {
 			// very carefully overwrite each index without reading it
 			ptr::write(&mut x[i], Box::new(i));
 		}
 	}
 	println!("{}", x);
 }
 ```
 It's worth noting that you don't need to worry about ptr::write-style shenanigans with
 Plain Old Data (POD; types which don't implement Drop, nor contain Drop types),
 because Rust knows not to try to Drop them. Similarly you should be able to assign the POD
 fields of partially initialized structs directly.
 However when working with uninitialized memory you need to be ever vigilant for Rust trying to
 Drop values you make like this before they're fully initialized. So every control path through
 that variable's scope must initialize the value before it ends. *This includes code panicking*.
 Again, POD types need not worry.
 And that's about it for working with uninitialized memory! Basically nothing anywhere expects
 to be handed uninitialized memory, so if you're going to pass it around at all, be sure to be
 *really* careful.