mirror of https://github.com/rust-lang/nomicon
parent
b0f30f264e
commit
c6c64270bf
@ -1,10 +1,42 @@
|
||||
# Summary
|
||||
|
||||
* [Meet Safe and Unsafe](meet-safe-and-unsafe.md)
|
||||
* [What Do Safe and Unsafe Mean](safe-unsafe-meaning.md)
|
||||
* [Working with Unsafe](working-with-unsafe.md)
|
||||
* [Data Layout](data.md)
|
||||
* [Ownership and Lifetimes](lifetimes.md)
|
||||
* [Conversions](conversions.md)
|
||||
* [repr(Rust)](repr-rust.md)
|
||||
* [Exotically Sized Types](exotic-sizes.md)
|
||||
* [Other reprs](other-reprs.md)
|
||||
* [Ownership](ownership.md)
|
||||
* [References](references.md)
|
||||
* [Lifetimes](lifetimes.md)
|
||||
* [Limits of lifetimes](lifetime-mismatch.md)
|
||||
* [Lifetime Elision](lifetime-elision.md)
|
||||
* [Unbounded Lifetimes](unbounded-lifetimes.md)
|
||||
* [Higher-Rank Trait Bounds](hrtb.md)
|
||||
* [Subtyping and Variance](subtyping.md)
|
||||
* [Misc](lifetime-misc.md)
|
||||
* [Type Conversions](conversions.md)
|
||||
* [Coercions](coercions.md)
|
||||
* [The Dot Operator](dot-operator.md)
|
||||
* [Casts](casts.md)
|
||||
* [Transmutes](transmutes.md)
|
||||
* [Uninitialized Memory](uninitialized.md)
|
||||
* [Ownership-oriented resource management (RAII)](raii.md)
|
||||
* [Checked](checked-uninit.md)
|
||||
* [Unchecked](unchecked-uninit.md)
|
||||
* [Ownership-Oriented Resource Management](raii.md)
|
||||
* [Constructors](constructors.md)
|
||||
* [Destructors](destructors.md)
|
||||
* [Leaking](leaking.md)
|
||||
* [Unwinding](unwinding.md)
|
||||
* [Concurrency](concurrency.md)
|
||||
* [Example: Implementing Vec](vec.md)
|
||||
* [Example: Implementing Vec](vec.md)
|
||||
* [Layout](vec-layout.md)
|
||||
* [Allocating](vec-alloc.md)
|
||||
* [Push and Pop](vec-push-pop.md)
|
||||
* [Deallocating](vec-dealloc.md)
|
||||
* [Deref](vec-deref.md)
|
||||
* [Insert and Remove](vec-insert-remove.md)
|
||||
* [IntoIter](vec-into-iter.md)
|
||||
* [Drain](vec-drain.md)
|
||||
* [Final Code](vec-final.md)
|
@ -0,0 +1,55 @@
|
||||
% Casts
|
||||
|
||||
Casts are a superset of coercions: every coercion can be explicitly invoked via a
|
||||
cast, but some conversions *require* a cast. These "true casts" are generally regarded
|
||||
as dangerous or problematic actions. True casts revolve around raw pointers and
|
||||
the primitive numeric types. True casts aren't checked.
|
||||
|
||||
Here's an exhaustive list of all the true casts. For brevity, we will use `*`
|
||||
to denote either a `*const` or `*mut`, and `integer` to denote any integral primitive:
|
||||
|
||||
* `*T as *U` where `T, U: Sized`
|
||||
* `*T as *U` TODO: explain unsized situation
|
||||
* `*T as integer`
|
||||
* `integer as *T`
|
||||
* `number as number`
|
||||
* `C-like-enum as integer`
|
||||
* `bool as integer`
|
||||
* `char as integer`
|
||||
* `u8 as char`
|
||||
* `&[T; n] as *const T`
|
||||
* `fn as *T` where `T: Sized`
|
||||
* `fn as integer`
|
||||
|
||||
where `&.T` and `*T` are references of either mutability,
|
||||
and where unsize_kind(`T`) is the kind of the unsize info
|
||||
in `T` - the vtable for a trait definition (e.g. `fmt::Display` or
|
||||
`Iterator`, not `Iterator<Item=u8>`) or a length (or `()` if `T: Sized`).
|
||||
|
||||
Note that lengths are not adjusted when casting raw slices -
|
||||
`T: *const [u16] as *const [u8]` creates a slice that only includes
|
||||
half of the original memory.
|
||||
|
||||
Casting is not transitive, that is, even if `e as U1 as U2` is a valid
|
||||
expression, `e as U2` is not necessarily so (in fact it will only be valid if
|
||||
`U1` coerces to `U2`).
|
||||
|
||||
For numeric casts, there are quite a few cases to consider:
|
||||
|
||||
* casting between two integers of the same size (e.g. i32 -> u32) is a no-op
|
||||
* casting from a larger integer to a smaller integer (e.g. u32 -> u8) will truncate
|
||||
* casting from a smaller integer to a larger integer (e.g. u8 -> u32) will
|
||||
* zero-extend if the source is unsigned
|
||||
* sign-extend if the source is signed
|
||||
* casting from a float to an integer will round the float towards zero
|
||||
* **NOTE: currently this will cause Undefined Behaviour if the rounded
|
||||
value cannot be represented by the target integer type**. This is a bug
|
||||
and will be fixed. (TODO: figure out what Inf and NaN do)
|
||||
* casting from an integer to float will produce the floating point representation
|
||||
of the integer, rounded if necessary (rounding strategy unspecified).
|
||||
* casting from an f32 to an f64 is perfect and lossless.
|
||||
* casting from an f64 to an f32 will produce the closest possible value
|
||||
(rounding strategy unspecified).
|
||||
* **NOTE: currently this will cause Undefined Behaviour if the value
|
||||
is finite but larger or smaller than the largest or smallest finite
|
||||
value representable by f32**. This is a bug and will be fixed.
|
@ -0,0 +1,109 @@
|
||||
% Checked Uninitialized Memory
|
||||
|
||||
Like C, all stack variables in Rust are uninitialized until a
|
||||
value is explicitly assigned to them. Unlike C, Rust statically prevents you
|
||||
from ever reading them until you do:
|
||||
|
||||
```rust
|
||||
fn main() {
|
||||
let x: i32;
|
||||
println!("{}", x);
|
||||
}
|
||||
```
|
||||
|
||||
```text
|
||||
src/main.rs:3:20: 3:21 error: use of possibly uninitialized variable: `x`
|
||||
src/main.rs:3 println!("{}", x);
|
||||
^
|
||||
```
|
||||
|
||||
This is based off of a basic branch analysis: every branch must assign a value
|
||||
to `x` before it is first used. Interestingly, Rust doesn't require the variable
|
||||
to be mutable to perform a delayed initialization if every branch assigns
|
||||
exactly once. However the analysis does not take advantage of constant analysis
|
||||
or anything like that. So this compiles:
|
||||
|
||||
```rust
|
||||
fn main() {
|
||||
let x: i32;
|
||||
|
||||
if true {
|
||||
x = 1;
|
||||
} else {
|
||||
x = 2;
|
||||
}
|
||||
|
||||
println!("{}", x);
|
||||
}
|
||||
```
|
||||
|
||||
but this doesn't:
|
||||
|
||||
```rust
|
||||
fn main() {
|
||||
let x: i32;
|
||||
if true {
|
||||
x = 1;
|
||||
}
|
||||
println!("{}", x);
|
||||
}
|
||||
```
|
||||
|
||||
```text
|
||||
src/main.rs:6:17: 6:18 error: use of possibly uninitialized variable: `x`
|
||||
src/main.rs:6 println!("{}", x);
|
||||
```
|
||||
|
||||
while this does:
|
||||
|
||||
```rust
|
||||
fn main() {
|
||||
let x: i32;
|
||||
if true {
|
||||
x = 1;
|
||||
println!("{}", x);
|
||||
}
|
||||
// Don't care that there are branches where it's not initialized
|
||||
// since we don't use the value in those branches
|
||||
}
|
||||
```
|
||||
|
||||
If a value is moved out of a variable, that variable becomes logically
|
||||
uninitialized if the type of the value isn't Copy. That is:
|
||||
|
||||
```rust
|
||||
fn main() {
|
||||
let x = 0;
|
||||
let y = Box::new(0);
|
||||
let z1 = x; // x is still valid because i32 is Copy
|
||||
let z2 = y; // y is now logically uninitialized because Box isn't Copy
|
||||
}
|
||||
```
|
||||
|
||||
However reassigning `y` in this example *would* require `y` to be marked as
|
||||
mutable, as a Safe Rust program could observe that the value of `y` changed.
|
||||
Otherwise the variable is exactly like new.
|
||||
|
||||
This raises an interesting question with respect to `Drop`: where does Rust try
|
||||
to call the destructor of a variable that is conditionally initialized? It turns
|
||||
out that Rust actually tracks whether a type should be dropped or not *at
|
||||
runtime*. As a variable becomes initialized and uninitialized, a *drop flag* for
|
||||
that variable is set and unset. When a variable goes out of scope or is assigned
|
||||
a value, it evaluates whether the current value of the variable should be dropped.
|
||||
Of course, static analysis can remove these checks. If the compiler can prove that
|
||||
a value is guaranteed to be either initialized or not, then it can theoretically
|
||||
generate more efficient code! As such it may be desirable to structure code to
|
||||
have *static drop semantics* when possible.
|
||||
|
||||
As of Rust 1.0, the drop flags are actually not-so-secretly stashed in a hidden
|
||||
field of any type that implements Drop. The language sets the drop flag by
|
||||
overwriting the entire struct with a particular value. This is pretty obviously
|
||||
Not The Fastest and causes a bunch of trouble with optimizing code. As such work
|
||||
is currently under way to move the flags out onto the stack frame where they
|
||||
more reasonably belong. Unfortunately this work will take some time as it
|
||||
requires fairly substantial changes to the compiler.
|
||||
|
||||
So in general, Rust programs don't need to worry about uninitialized values on
|
||||
the stack for correctness. Although they might care for performance. Thankfully,
|
||||
Rust makes it easy to take control here! Uninitialized values are there, and
|
||||
Safe Rust lets you work with them, but you're never in danger.
|
@ -0,0 +1,72 @@
|
||||
% Coercions
|
||||
|
||||
Types can implicitly be coerced to change in certain contexts. These changes are
|
||||
generally just *weakening* of types, largely focused around pointers and lifetimes.
|
||||
They mostly exist to make Rust "just work" in more cases, and are largely harmless.
|
||||
|
||||
Here's all the kinds of coercion:
|
||||
|
||||
|
||||
Coercion is allowed between the following types:
|
||||
|
||||
* Subtyping: `T` to `U` if `T` is a [subtype](lifetimes.html#subtyping-and-variance)
|
||||
of `U`
|
||||
* Transitivity: `T_1` to `T_3` where `T_1` coerces to `T_2` and `T_2` coerces to `T_3`
|
||||
* Pointer Weakening:
|
||||
* `&mut T` to `&T`
|
||||
* `*mut T` to `*const T`
|
||||
* `&T` to `*const T`
|
||||
* `&mut T` to `*mut T`
|
||||
* Unsizing: `T` to `U` if `T` implements `CoerceUnsized<U>`
|
||||
|
||||
`CoerceUnsized<Pointer<U>> for Pointer<T> where T: Unsize<U>` is implemented
|
||||
for all pointer types (including smart pointers like Box and Rc). Unsize is
|
||||
only implemented automatically, and enables the following transformations:
|
||||
|
||||
* `[T, ..n]` => `[T]`
|
||||
* `T` => `Trait` where `T: Trait`
|
||||
* `SubTrait` => `Trait` where `SubTrait: Trait` (TODO: is this now implied by the previous?)
|
||||
* `Foo<..., T, ...>` => `Foo<..., U, ...>` where:
|
||||
* `T: Unsize<U>`
|
||||
* `Foo` is a struct
|
||||
* Only the last field has type `T`
|
||||
* `T` is not part of the type of any other fields
|
||||
|
||||
Coercions occur at a *coercion site*. Any location that is explicitly typed
|
||||
will cause a coercion to its type. If inference is necessary, the coercion will
|
||||
not be performed. Exhaustively, the coercion sites for an expression `e` to
|
||||
type `U` are:
|
||||
|
||||
* let statements, statics, and consts: `let x: U = e`
|
||||
* Arguments to functions: `takes_a_U(e)`
|
||||
* Any expression that will be returned: `fn foo() -> U { e }`
|
||||
* Struct literals: `Foo { some_u: e }`
|
||||
* Array literals: `let x: [U; 10] = [e, ..]`
|
||||
* Tuple literals: `let x: (U, ..) = (e, ..)`
|
||||
* The last expression in a block: `let x: U = { ..; e }`
|
||||
|
||||
Note that we do not perform coercions when matching traits (except for
|
||||
receivers, see below). If there is an impl for some type `U` and `T` coerces to
|
||||
`U`, that does not constitute an implementation for `T`. For example, the
|
||||
following will not type check, even though it is OK to coerce `t` to `&T` and
|
||||
there is an impl for `&T`:
|
||||
|
||||
```rust
|
||||
trait Trait {}
|
||||
|
||||
fn foo<X: Trait>(t: X) {}
|
||||
|
||||
impl<'a> Trait for &'a i32 {}
|
||||
|
||||
|
||||
fn main() {
|
||||
let t: &mut i32 = &mut 0;
|
||||
foo(t);
|
||||
}
|
||||
```
|
||||
|
||||
```text
|
||||
<anon>:10:5: 10:8 error: the trait `Trait` is not implemented for the type `&mut i32` [E0277]
|
||||
<anon>:10 foo(t);
|
||||
^~~
|
||||
```
|
@ -0,0 +1,26 @@
|
||||
% Constructors
|
||||
|
||||
Unlike C++, Rust does not come with a slew of builtin
|
||||
kinds of constructor. There are no Copy, Default, Assignment, Move, or whatever constructors.
|
||||
This largely has to do with Rust's philosophy of being explicit.
|
||||
|
||||
Move constructors are meaningless in Rust because we don't enable types to "care" about their
|
||||
location in memory. Every type must be ready for it to be blindly memcopied to somewhere else
|
||||
in memory. This means pure on-the-stack-but-still-movable intrusive linked lists are simply
|
||||
not happening in Rust (safely).
|
||||
|
||||
Assignment and copy constructors similarly don't exist because move semantics are the *default*
|
||||
in rust. At most `x = y` just moves the bits of y into the x variable. Rust does provide two
|
||||
facilities for going back to C++'s copy-oriented semantics: `Copy` and `Clone`. Clone is our
|
||||
moral equivalent of a copy constructor, but it's never implicitly invoked. You have to explicitly
|
||||
call `clone` on an element you want to be cloned. Copy is a special case of Clone where the
|
||||
implementation is just "copy the bits". Copy types *are* implicitly
|
||||
cloned whenever they're moved, but because of the definition of Copy this just means *not*
|
||||
treating the old copy as uninitialized -- a no-op.
|
||||
|
||||
While Rust provides a `Default` trait for specifying the moral equivalent of a default
|
||||
constructor, it's incredibly rare for this trait to be used. This is because variables
|
||||
[aren't implicitly initialized][uninit]. Default is basically only useful for generic
|
||||
programming. In concrete contexts, a type will provide a static `new` method for any
|
||||
kind of "default" constructor. This has no relation to `new` in other
|
||||
languages and has no special meaning. It's just a naming convention.
|
@ -0,0 +1,140 @@
|
||||
% Destructors
|
||||
|
||||
What the language *does* provide is full-blown automatic destructors through the `Drop` trait,
|
||||
which provides the following method:
|
||||
|
||||
```rust
|
||||
fn drop(&mut self);
|
||||
```
|
||||
|
||||
This method gives the type time to somehow finish what it was doing. **After `drop` is run,
|
||||
Rust will recursively try to drop all of the fields of `self`**. This is a
|
||||
convenience feature so that you don't have to write "destructor boilerplate" to drop
|
||||
children. If a struct has no special logic for being dropped other than dropping its
|
||||
children, then it means `Drop` doesn't need to be implemented at all!
|
||||
|
||||
**There is no stable way to prevent this behaviour in Rust 1.0**.
|
||||
|
||||
Note that taking `&mut self` means that even if you *could* suppress recursive Drop,
|
||||
Rust will prevent you from e.g. moving fields out of self. For most types, this
|
||||
is totally fine.
|
||||
|
||||
For instance, a custom implementation of `Box` might write `Drop` like this:
|
||||
|
||||
```rust
|
||||
struct Box<T>{ ptr: *mut T }
|
||||
|
||||
impl<T> Drop for Box<T> {
|
||||
fn drop(&mut self) {
|
||||
unsafe {
|
||||
(*self.ptr).drop();
|
||||
heap::deallocate(self.ptr);
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
and this works fine because when Rust goes to drop the `ptr` field it just sees a *mut that
|
||||
has no actual `Drop` implementation. Similarly nothing can use-after-free the `ptr` because
|
||||
the Box is immediately marked as uninitialized.
|
||||
|
||||
However this wouldn't work:
|
||||
|
||||
```rust
|
||||
struct Box<T>{ ptr: *mut T }
|
||||
|
||||
impl<T> Drop for Box<T> {
|
||||
fn drop(&mut self) {
|
||||
unsafe {
|
||||
(*self.ptr).drop();
|
||||
heap::deallocate(self.ptr);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
struct SuperBox<T> { box: Box<T> }
|
||||
|
||||
impl<T> Drop for SuperBox<T> {
|
||||
fn drop(&mut self) {
|
||||
unsafe {
|
||||
// Hyper-optimized: deallocate the box's contents for it
|
||||
// without `drop`ing the contents
|
||||
heap::deallocate(self.box.ptr);
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
After we deallocate the `box`'s ptr in SuperBox's destructor, Rust will
|
||||
happily proceed to tell the box to Drop itself and everything will blow up with
|
||||
use-after-frees and double-frees.
|
||||
|
||||
Note that the recursive drop behaviour applies to *all* structs and enums
|
||||
regardless of whether they implement Drop. Therefore something like
|
||||
|
||||
```rust
|
||||
struct Boxy<T> {
|
||||
data1: Box<T>,
|
||||
data2: Box<T>,
|
||||
info: u32,
|
||||
}
|
||||
```
|
||||
|
||||
will have its data1 and data2's fields destructors whenever it "would" be
|
||||
dropped, even though it itself doesn't implement Drop. We say that such a type
|
||||
*needs Drop*, even though it is not itself Drop.
|
||||
|
||||
Similarly,
|
||||
|
||||
```rust
|
||||
enum Link {
|
||||
Next(Box<Link>),
|
||||
None,
|
||||
}
|
||||
```
|
||||
|
||||
will have its inner Box field dropped *if and only if* an instance stores the Next variant.
|
||||
|
||||
In general this works really nice because you don't need to worry about adding/removing
|
||||
drops when you refactor your data layout. Still there's certainly many valid usecases for
|
||||
needing to do trickier things with destructors.
|
||||
|
||||
The classic safe solution to overriding recursive drop and allowing moving out
|
||||
of Self during `drop` is to use an Option:
|
||||
|
||||
```rust
|
||||
struct Box<T>{ ptr: *mut T }
|
||||
|
||||
impl<T> Drop for Box<T> {
|
||||
fn drop(&mut self) {
|
||||
unsafe {
|
||||
(*self.ptr).drop();
|
||||
heap::deallocate(self.ptr);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
struct SuperBox<T> { box: Option<Box<T>> }
|
||||
|
||||
impl<T> Drop for SuperBox<T> {
|
||||
fn drop(&mut self) {
|
||||
unsafe {
|
||||
// Hyper-optimized: deallocate the box's contents for it
|
||||
// without `drop`ing the contents. Need to set the `box`
|
||||
// field as `None` to prevent Rust from trying to Drop it.
|
||||
heap::deallocate(self.box.take().unwrap().ptr);
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
However this has fairly odd semantics: you're saying that a field that *should* always
|
||||
be Some may be None, just because that happens in the destructor. Of course this
|
||||
conversely makes a lot of sense: you can call arbitrary methods on self during
|
||||
the destructor, and this should prevent you from ever doing so after deinitializing
|
||||
the field. Not that it will prevent you from producing any other
|
||||
arbitrarily invalid state in there.
|
||||
|
||||
On balance this is an ok choice. Certainly what you should reach for by default.
|
||||
However, in the future we expect there to be a first-class way to announce that
|
||||
a field shouldn't be automatically dropped.
|
@ -0,0 +1,6 @@
|
||||
% The Dot Operator
|
||||
|
||||
The dot operator will perform a lot of magic to convert types. It will perform
|
||||
auto-referencing, auto-dereferencing, and coercion until types match.
|
||||
|
||||
TODO: steal information from http://stackoverflow.com/questions/28519997/what-are-rusts-exact-auto-dereferencing-rules/28552082#28552082
|
@ -0,0 +1,69 @@
|
||||
% Exotically Sized Types
|
||||
|
||||
Most of the time, we think in terms of types with a fixed, positive size. This
|
||||
is not always the case, however.
|
||||
|
||||
# Dynamically Sized Types (DSTs)
|
||||
|
||||
Rust also supports types without a statically known size. On the surface,
|
||||
this is a bit nonsensical: Rust *must* know the size of something in order to
|
||||
work with it! DSTs are generally produced as views, or through type-erasure
|
||||
of types that *do* have a known size. Due to their lack of a statically known
|
||||
size, these types can only exist *behind* some kind of pointer. They consequently
|
||||
produce a *fat* pointer consisting of the pointer and the information that
|
||||
*completes* them.
|
||||
|
||||
For instance, the slice type, `[T]`, is some statically unknown number of elements
|
||||
stored contiguously. `&[T]` consequently consists of a `(&T, usize)` pair that specifies
|
||||
where the slice starts, and how many elements it contains. Similarly, Trait Objects
|
||||
support interface-oriented type erasure through a `(data_ptr, vtable_ptr)` pair.
|
||||
|
||||
Structs can actually store a single DST directly as their last field, but this
|
||||
makes them a DST as well:
|
||||
|
||||
```rust
|
||||
// Can't be stored on the stack directly
|
||||
struct Foo {
|
||||
info: u32,
|
||||
data: [u8],
|
||||
}
|
||||
```
|
||||
|
||||
**NOTE: As of Rust 1.0 struct DSTs are broken if the last field has
|
||||
a variable position based on its alignment.**
|
||||
|
||||
|
||||
|
||||
# Zero Sized Types (ZSTs)
|
||||
|
||||
Rust actually allows types to be specified that occupy *no* space:
|
||||
|
||||
```rust
|
||||
struct Foo; // No fields = no size
|
||||
enum Bar; // No variants = no size
|
||||
|
||||
// All fields have no size = no size
|
||||
struct Baz {
|
||||
foo: Foo,
|
||||
bar: Bar,
|
||||
qux: (), // empty tuple has no size
|
||||
}
|
||||
```
|
||||
|
||||
On their own, ZSTs are, for obvious reasons, pretty useless. However
|
||||
as with many curious layout choices in Rust, their potential is realized in a generic
|
||||
context.
|
||||
|
||||
Rust largely understands that any operation that produces or stores a ZST
|
||||
can be reduced to a no-op. For instance, a `HashSet<T>` can be effeciently implemented
|
||||
as a thin wrapper around `HashMap<T, ()>` because all the operations `HashMap` normally
|
||||
does to store and retrieve keys will be completely stripped in monomorphization.
|
||||
|
||||
Similarly `Result<(), ()>` and `Option<()>` are effectively just fancy `bool`s.
|
||||
|
||||
Safe code need not worry about ZSTs, but *unsafe* code must be careful about the
|
||||
consequence of types with no size. In particular, pointer offsets are no-ops, and
|
||||
standard allocators (including jemalloc, the one used by Rust) generally consider
|
||||
passing in `0` as Undefined Behaviour.
|
||||
|
||||
|
@ -0,0 +1,72 @@
|
||||
% Higher-Rank Trait Bounds (HRTBs)
|
||||
|
||||
Rust's Fn traits are a little bit magic. For instance, we can write the
|
||||
following code:
|
||||
|
||||
```rust
|
||||
struct Closure<F> {
|
||||
data: (u8, u16),
|
||||
func: F,
|
||||
}
|
||||
|
||||
impl<F> Closure<F>
|
||||
where F: Fn(&(u8, u16)) -> &u8,
|
||||
{
|
||||
fn call(&self) -> &u8 {
|
||||
(self.func)(&self.data)
|
||||
}
|
||||
}
|
||||
|
||||
fn do_it(data: &(u8, u16)) -> &u8 { &data.0 }
|
||||
|
||||
fn main() {
|
||||
let clo = Closure { data: (0, 1), func: do_it };
|
||||
println!("{}", clo.call());
|
||||
}
|
||||
```
|
||||
|
||||
If we try to naively desugar this code in the same way that we did in the
|
||||
lifetimes section, we run into some trouble:
|
||||
|
||||
```rust
|
||||
struct Closure<F> {
|
||||
data: (u8, u16),
|
||||
func: F,
|
||||
}
|
||||
|
||||
impl<F> Closure<F>
|
||||
// where F: Fn(&'??? (u8, u16)) -> &'??? u8,
|
||||
{
|
||||
fn call<'a>(&'a self) -> &'a u8 {
|
||||
(self.func)(&self.data)
|
||||
}
|
||||
}
|
||||
|
||||
fn do_it<'b>(data: &'b (u8, u16)) -> &'b u8 { &'b data.0 }
|
||||
|
||||
fn main() {
|
||||
'x: {
|
||||
let clo = Closure { data: (0, 1), func: do_it };
|
||||
println!("{}", clo.call());
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
How on earth are we supposed to express the lifetimes on F's trait bound? We need
|
||||
to provide some lifetime there, but the lifetime we care about can't be named until
|
||||
we enter the body of `call`! Also, that isn't some fixed lifetime; call works with
|
||||
*any* lifetime `&self` happens to have at that point.
|
||||
|
||||
This job requires The Magic of Higher-Rank Trait Bounds. The way we desugar
|
||||
this is as follows:
|
||||
|
||||
```rust
|
||||
where for<'a> F: Fn(&'a (u8, u16)) -> &'a u8,
|
||||
```
|
||||
|
||||
(Where `Fn(a, b, c) -> d` is itself just sugar for the unstable *real* Fn trait)
|
||||
|
||||
`for<'a>` can be read as "for all choices of `'a`", and basically produces an
|
||||
*inifinite list* of trait bounds that F must satisfy. Intense. There aren't many
|
||||
places outside of the Fn traits where we encounter HRTBs, and even for those we
|
||||
have a nice magic sugar for the common cases.
|
@ -0,0 +1,229 @@
|
||||
% Leaking
|
||||
|
||||
Ownership based resource management is intended to simplify composition. You
|
||||
acquire resources when you create the object, and you release the resources
|
||||
when it gets destroyed. Since destruction is handled for you, it means you
|
||||
can't forget to release the resources, and it happens as soon as possible!
|
||||
Surely this is perfect and all of our problems are solved.
|
||||
|
||||
Everything is terrible and we have new and exotic problems to try to solve.
|
||||
|
||||
Many people like to believe that Rust eliminates resource leaks, but this
|
||||
is absolutely not the case, no matter how you look at it. In the strictest
|
||||
sense, "leaking" is so abstract as to be unpreventable. It's quite trivial
|
||||
to initialize a collection at the start of a program, fill it with tons of
|
||||
objects with destructors, and then enter an infinite event loop that never
|
||||
refers to it. The collection will sit around uselessly, holding on to its
|
||||
precious resources until the program terminates (at which point all those
|
||||
resources would have been reclaimed by the OS anyway).
|
||||
|
||||
We may consider a more restricted form of leak: failing to drop a value that
|
||||
is unreachable. Rust also doesn't prevent this. In fact Rust has a *function
|
||||
for doing this*: `mem::forget`. This function consumes the value it is passed
|
||||
*and then doesn't run its destructor*.
|
||||
|
||||
In the past `mem::forget` was marked as unsafe as a sort of lint against using
|
||||
it, since failing to call a destructor is generally not a well-behaved thing to
|
||||
do (though useful for some special unsafe code). However this was generally
|
||||
determined to be an untenable stance to take: there are *many* ways to fail to
|
||||
call a destructor in safe code. The most famous example is creating a cycle
|
||||
of reference counted pointers using interior mutability.
|
||||
|
||||
It is reasonable for safe code to assume that destructor leaks do not happen,
|
||||
as any program that leaks destructors is probably wrong. However *unsafe* code
|
||||
cannot rely on destructors to be run to be *safe*. For most types this doesn't
|
||||
matter: if you leak the destructor then the type is *by definition* inaccessible,
|
||||
so it doesn't matter, right? For instance, if you leak a `Box<u8>` then you
|
||||
waste some memory but that's hardly going to violate memory-safety.
|
||||
|
||||
However where we must be careful with destructor leaks are *proxy* types.
|
||||
These are types which manage access to a distinct object, but don't actually
|
||||
own it. Proxy objects are quite rare. Proxy objects you'll need to care about
|
||||
are even rarer. However we'll focus on three interesting examples in the
|
||||
standard library:
|
||||
|
||||
* `vec::Drain`
|
||||
* `Rc`
|
||||
* `thread::scoped::JoinGuard`
|
||||
|
||||
|
||||
|
||||
## Drain
|
||||
|
||||
`drain` is a collections API that moves data out of the container without
|
||||
consuming the container. This enables us to reuse the allocation of a `Vec`
|
||||
after claiming ownership over all of its contents. It produces an iterator
|
||||
(Drain) that returns the contents of the Vec by-value.
|
||||
|
||||
Now, consider Drain in the middle of iteration: some values have been moved out,
|
||||
and others haven't. This means that part of the Vec is now full of logically
|
||||
uninitialized data! We could backshift all the elements in the Vec every time we
|
||||
remove a value, but this would have pretty catastrophic performance consequences.
|
||||
|
||||
Instead, we would like Drain to *fix* the Vec's backing storage when it is
|
||||
dropped. It should run itself to completion, backshift any elements that weren't
|
||||
removed (drain supports subranges), and then fix Vec's `len`. It's even
|
||||
unwinding-safe! Easy!
|
||||
|
||||
Now consider the following:
|
||||
|
||||
```
|
||||
let mut vec = vec![Box::new(0); 4];
|
||||
|
||||
{
|
||||
// start draining, vec can no longer be accessed
|
||||
let mut drainer = vec.drain(..);
|
||||
|
||||
// pull out two elements and immediately drop them
|
||||
drainer.next();
|
||||
drainer.next();
|
||||
|
||||
// get rid of drainer, but don't call its destructor
|
||||
mem::forget(drainer);
|
||||
}
|
||||
|
||||
// Oops, vec[0] was dropped, we're reading a pointer into free'd memory!
|
||||
println!("{}", vec[0]);
|
||||
```
|
||||
|
||||
This is pretty clearly Not Good. Unfortunately, we're kind've stuck between
|
||||
a rock and a hard place: maintaining consistent state at every step has
|
||||
an enormous cost (and would negate any benefits of the API). Failing to maintain
|
||||
consistent state gives us Undefined Behaviour in safe code (making the API
|
||||
unsound).
|
||||
|
||||
So what can we do? Well, we can pick a trivially consistent state: set the Vec's
|
||||
len to be 0 when we *start* the iteration, and fix it up if necessary in the
|
||||
destructor. That way, if everything executes like normal we get the desired
|
||||
behaviour with minimal overhead. But if someone has the *audacity* to mem::forget
|
||||
us in the middle of the iteration, all that does is *leak even more* (and possibly
|
||||
leave the Vec in an *unexpected* but consistent state). Since we've
|
||||
accepted that mem::forget is safe, this is definitely safe. We call leaks causing
|
||||
more leaks a *leak amplification*.
|
||||
|
||||
|
||||
|
||||
|
||||
## Rc
|
||||
|
||||
Rc is an interesting case because at first glance it doesn't appear to be a
|
||||
proxy value at all. After all, it manages the data it points to, and dropping
|
||||
all the Rcs for a value will drop that value. leaking an Rc doesn't seem like
|
||||
it would be particularly dangerous. It will leave the refcount permanently
|
||||
incremented and prevent the data from being freed or dropped, but that seems
|
||||
just like Box, right?
|
||||
|
||||
Nope.
|
||||
|
||||
Let's consider a simplified implementation of Rc:
|
||||
|
||||
```rust
|
||||
struct Rc<T> {
|
||||
ptr: *mut RcBox<T>,
|
||||
}
|
||||
|
||||
struct RcBox<T> {
|
||||
data: T,
|
||||
ref_count: usize,
|
||||
}
|
||||
|
||||
impl<T> Rc<T> {
|
||||
fn new(data: T) -> Self {
|
||||
unsafe {
|
||||
// Wouldn't it be nice if heap::allocate worked like this?
|
||||
let ptr = heap::allocate<RcBox<T>>();
|
||||
ptr::write(ptr, RcBox {
|
||||
data: data,
|
||||
ref_count: 1,
|
||||
});
|
||||
Rc { ptr: ptr }
|
||||
}
|
||||
}
|
||||
|
||||
fn clone(&self) -> Self {
|
||||
unsafe {
|
||||
(*self.ptr).ref_count += 1;
|
||||
}
|
||||
Rc { ptr: self.ptr }
|
||||
}
|
||||
}
|
||||
|
||||
impl<T> Drop for Rc<T> {
|
||||
fn drop(&mut self) {
|
||||
unsafe {
|
||||
let inner = &mut ;
|
||||
(*self.ptr).ref_count -= 1;
|
||||
if (*self.ptr).ref_count == 0 {
|
||||
// drop the data and then free it
|
||||
ptr::read(self.ptr);
|
||||
heap::deallocate(self.ptr);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
This code contains an implicit and subtle assumption: ref_count can fit in a
|
||||
`usize`, because there can't be more than `usize::MAX` Rcs in memory. However
|
||||
this itself assumes that the ref_count accurately reflects the number of Rcs
|
||||
in memory, which we know is false with mem::forget. Using mem::forget we can
|
||||
overflow the ref_count, and then get it down to 0 with outstanding Rcs. Then we
|
||||
can happily use-after-free the inner data. Bad Bad Not Good.
|
||||
|
||||
This can be solved by *saturating* the ref_count, which is sound because
|
||||
decreasing the refcount by `n` still requires `n` Rcs simultaneously living
|
||||
in memory.
|
||||
|
||||
|
||||
|
||||
|
||||
## thread::scoped::JoinGuard
|
||||
|
||||
The thread::scoped API intends to allow threads to be spawned that reference
|
||||
data on the stack without any synchronization over that data. Usage looked like:
|
||||
|
||||
```rust
|
||||
let mut data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
|
||||
{
|
||||
let guards = vec![];
|
||||
for x in &mut data {
|
||||
// Move the mutable reference into the closure, and execute
|
||||
// it on a different thread. The closure has a lifetime bound
|
||||
// by the lifetime of the mutable reference `x` we store in it.
|
||||
// The guard that is returned is in turn assigned the lifetime
|
||||
// of the closure, so it also mutably borrows `data` as `x` did.
|
||||
// This means we cannot access `data` until the guard goes away.
|
||||
let guard = thread::scoped(move || {
|
||||
*x *= 2;
|
||||
});
|
||||
// store the thread's guard for later
|
||||
guards.push(guard);
|
||||
}
|
||||
// All guards are dropped here, forcing the threads to join
|
||||
// (this thread blocks here until the others terminate).
|
||||
// Once the threads join, the borrow expires and the data becomes
|
||||
// accessible again in this thread.
|
||||
}
|
||||
// data is definitely mutated here.
|
||||
```
|
||||
|
||||
In principle, this totally works! Rust's ownership system perfectly ensures it!
|
||||
...except it relies on a destructor being called to be safe.
|
||||
|
||||
```
|
||||
let mut data = Box::new(0);
|
||||
{
|
||||
let guard = thread::scoped(|| {
|
||||
// This is at best a data race. At worst, it's *also* a use-after-free.
|
||||
*data += 1;
|
||||
});
|
||||
// Because the guard is forgotten, expiring the loan without blocking this
|
||||
// thread.
|
||||
mem::forget(guard);
|
||||
}
|
||||
// So the Box is dropped here while the scoped thread may or may not be trying
|
||||
// to access it.
|
||||
```
|
||||
|
||||
Dang. Here the destructor running was pretty fundamental to the API, and it had
|
||||
to be scrapped in favour of a completely different design.
|
@ -0,0 +1,64 @@
|
||||
% Lifetime Elision
|
||||
|
||||
In order to make common patterns more ergonomic, Rust allows lifetimes to be
|
||||
*elided* in function signatures.
|
||||
|
||||
A *lifetime position* is anywhere you can write a lifetime in a type:
|
||||
|
||||
```rust
|
||||
&'a T
|
||||
&'a mut T
|
||||
T<'a>
|
||||
```
|
||||
|
||||
Lifetime positions can appear as either "input" or "output":
|
||||
|
||||
* For `fn` definitions, input refers to the types of the formal arguments
|
||||
in the `fn` definition, while output refers to
|
||||
result types. So `fn foo(s: &str) -> (&str, &str)` has elided one lifetime in
|
||||
input position and two lifetimes in output position.
|
||||
Note that the input positions of a `fn` method definition do not
|
||||
include the lifetimes that occur in the method's `impl` header
|
||||
(nor lifetimes that occur in the trait header, for a default method).
|
||||
|
||||
* In the future, it should be possible to elide `impl` headers in the same manner.
|
||||
|
||||
Elision rules are as follows:
|
||||
|
||||
* Each elided lifetime in input position becomes a distinct lifetime
|
||||
parameter.
|
||||
|
||||
* If there is exactly one input lifetime position (elided or not), that lifetime
|
||||
is assigned to *all* elided output lifetimes.
|
||||
|
||||
* If there are multiple input lifetime positions, but one of them is `&self` or
|
||||
`&mut self`, the lifetime of `self` is assigned to *all* elided output lifetimes.
|
||||
|
||||
* Otherwise, it is an error to elide an output lifetime.
|
||||
|
||||
Examples:
|
||||
|
||||
```rust
|
||||
fn print(s: &str); // elided
|
||||
fn print<'a>(s: &'a str); // expanded
|
||||
|
||||
fn debug(lvl: uint, s: &str); // elided
|
||||
fn debug<'a>(lvl: uint, s: &'a str); // expanded
|
||||
|
||||
fn substr(s: &str, until: uint) -> &str; // elided
|
||||
fn substr<'a>(s: &'a str, until: uint) -> &'a str; // expanded
|
||||
|
||||
fn get_str() -> &str; // ILLEGAL
|
||||
|
||||
fn frob(s: &str, t: &str) -> &str; // ILLEGAL
|
||||
|
||||
fn get_mut(&mut self) -> &mut T; // elided
|
||||
fn get_mut<'a>(&'a mut self) -> &'a mut T; // expanded
|
||||
|
||||
fn args<T:ToCStr>(&mut self, args: &[T]) -> &mut Command // elided
|
||||
fn args<'a, 'b, T:ToCStr>(&'a mut self, args: &'b [T]) -> &'a mut Command // expanded
|
||||
|
||||
fn new(buf: &mut [u8]) -> BufWriter; // elided
|
||||
fn new<'a>(buf: &'a mut [u8]) -> BufWriter<'a> // expanded
|
||||
|
||||
```
|
@ -0,0 +1,229 @@
|
||||
% misc
|
||||
|
||||
This is just a dumping ground while I work out what to do with this stuff
|
||||
|
||||
|
||||
# PhantomData
|
||||
|
||||
When working with unsafe code, we can often end up in a situation where
|
||||
types or lifetimes are logically associated with a struct, but not actually
|
||||
part of a field. This most commonly occurs with lifetimes. For instance, the `Iter`
|
||||
for `&'a [T]` is (approximately) defined as follows:
|
||||
|
||||
```rust
|
||||
pub struct Iter<'a, T: 'a> {
|
||||
ptr: *const T,
|
||||
end: *const T,
|
||||
}
|
||||
```
|
||||
|
||||
However because `'a` is unused within the struct's body, it's *unbound*.
|
||||
Because of the troubles this has historically caused, unbound lifetimes and
|
||||
types are *illegal* in struct definitions. Therefore we must somehow refer
|
||||
to these types in the body. Correctly doing this is necessary to have
|
||||
correct variance and drop checking.
|
||||
|
||||
We do this using *PhantomData*, which is a special marker type. PhantomData
|
||||
consumes no space, but simulates a field of the given type for the purpose of
|
||||
static analysis. This was deemed to be less error-prone than explicitly telling
|
||||
the type-system the kind of variance that you want, while also providing other
|
||||
useful information.
|
||||
|
||||
Iter logically contains `&'a T`, so this is exactly what we tell
|
||||
the PhantomData to simulate:
|
||||
|
||||
```
|
||||
pub struct Iter<'a, T: 'a> {
|
||||
ptr: *const T,
|
||||
end: *const T,
|
||||
_marker: marker::PhantomData<&'a T>,
|
||||
}
|
||||
```
|
||||
|
||||
|
||||
|
||||
|
||||
# Dropck
|
||||
|
||||
When a type is going out of scope, Rust will try to Drop it. Drop executes
|
||||
arbitrary code, and in fact allows us to "smuggle" arbitrary code execution
|
||||
into many places. As such additional soundness checks (dropck) are necessary to
|
||||
ensure that a type T can be safely instantiated and dropped. It turns out that we
|
||||
*really* don't need to care about dropck in practice, as it often "just works".
|
||||
|
||||
However the one exception is with PhantomData. Given a struct like Vec:
|
||||
|
||||
```
|
||||
struct Vec<T> {
|
||||
data: *const T, // *const for variance!
|
||||
len: usize,
|
||||
cap: usize,
|
||||
}
|
||||
```
|
||||
|
||||
dropck will generously determine that Vec<T> does not own any values of
|
||||
type T. This will unfortunately allow people to construct unsound Drop
|
||||
implementations that access data that has already been dropped. In order to
|
||||
tell dropck that we *do* own values of type T, and may call destructors of that
|
||||
type, we must add extra PhantomData:
|
||||
|
||||
```
|
||||
struct Vec<T> {
|
||||
data: *const T, // *const for covariance!
|
||||
len: usize,
|
||||
cap: usize,
|
||||
_marker: marker::PhantomData<T>,
|
||||
}
|
||||
```
|
||||
|
||||
Raw pointers that own an allocation is such a pervasive pattern that the
|
||||
standard library made a utility for itself called `Unique<T>` which:
|
||||
|
||||
* wraps a `*const T`,
|
||||
* includes a `PhantomData<T>`,
|
||||
* auto-derives Send/Sync as if T was contained
|
||||
* marks the pointer as NonZero for the null-pointer optimization
|
||||
|
||||
|
||||
|
||||
|
||||
# Splitting Lifetimes
|
||||
|
||||
The mutual exclusion property of mutable references can be very limiting when
|
||||
working with a composite structure. The borrow checker understands some basic stuff, but
|
||||
will fall over pretty easily. It *does* understand structs sufficiently to
|
||||
know that it's possible to borrow disjoint fields of a struct simultaneously.
|
||||
So this works today:
|
||||
|
||||
```rust
|
||||
struct Foo {
|
||||
a: i32,
|
||||
b: i32,
|
||||
c: i32,
|
||||
}
|
||||
|
||||
let mut x = Foo {a: 0, b: 0, c: 0};
|
||||
let a = &mut x.a;
|
||||
let b = &mut x.b;
|
||||
let c = &x.c;
|
||||
*b += 1;
|
||||
let c2 = &x.c;
|
||||
*a += 10;
|
||||
println!("{} {} {} {}", a, b, c, c2);
|
||||
```
|
||||
|
||||
However borrowck doesn't understand arrays or slices in any way, so this doesn't
|
||||
work:
|
||||
|
||||
```rust
|
||||
let x = [1, 2, 3];
|
||||
let a = &mut x[0];
|
||||
let b = &mut x[1];
|
||||
println!("{} {}", a, b);
|
||||
```
|
||||
|
||||
```text
|
||||
<anon>:3:18: 3:22 error: cannot borrow immutable indexed content `x[..]` as mutable
|
||||
<anon>:3 let a = &mut x[0];
|
||||
^~~~
|
||||
<anon>:4:18: 4:22 error: cannot borrow immutable indexed content `x[..]` as mutable
|
||||
<anon>:4 let b = &mut x[1];
|
||||
^~~~
|
||||
error: aborting due to 2 previous errors
|
||||
```
|
||||
|
||||
While it was plausible that borrowck could understand this simple case, it's
|
||||
pretty clearly hopeless for borrowck to understand disjointness in general
|
||||
container types like a tree, especially if distinct keys actually *do* map
|
||||
to the same value.
|
||||
|
||||
In order to "teach" borrowck that what we're doing is ok, we need to drop down
|
||||
to unsafe code. For instance, mutable slices expose a `split_at_mut` function that
|
||||
consumes the slice and returns *two* mutable slices. One for everything to the
|
||||
left of the index, and one for everything to the right. Intuitively we know this
|
||||
is safe because the slices don't alias. However the implementation requires some
|
||||
unsafety:
|
||||
|
||||
```rust
|
||||
fn split_at_mut(&mut self, mid: usize) -> (&mut [T], &mut [T]) {
|
||||
unsafe {
|
||||
let self2: &mut [T] = mem::transmute_copy(&self);
|
||||
|
||||
(ops::IndexMut::index_mut(self, ops::RangeTo { end: mid } ),
|
||||
ops::IndexMut::index_mut(self2, ops::RangeFrom { start: mid } ))
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
This is pretty plainly dangerous. We use transmute to duplicate the slice with an
|
||||
*unbounded* lifetime, so that it can be treated as disjoint from the other until
|
||||
we unify them when we return.
|
||||
|
||||
However more subtle is how iterators that yield mutable references work.
|
||||
The iterator trait is defined as follows:
|
||||
|
||||
```rust
|
||||
trait Iterator {
|
||||
type Item;
|
||||
|
||||
fn next(&mut self) -> Option<Self::Item>;
|
||||
}
|
||||
```
|
||||
|
||||
Given this definition, Self::Item has *no* connection to `self`. This means
|
||||
that we can call `next` several times in a row, and hold onto all the results
|
||||
*concurrently*. This is perfectly fine for by-value iterators, which have exactly
|
||||
these semantics. It's also actually fine for shared references, as they admit
|
||||
arbitrarily many references to the same thing (although the
|
||||
iterator needs to be a separate object from the thing being shared). But mutable
|
||||
references make this a mess. At first glance, they might seem completely
|
||||
incompatible with this API, as it would produce multiple mutable references to
|
||||
the same object!
|
||||
|
||||
However it actually *does* work, exactly because iterators are one-shot objects.
|
||||
Everything an IterMut yields will be yielded *at most* once, so we don't *actually*
|
||||
ever yield multiple mutable references to the same piece of data.
|
||||
|
||||
In general all mutable iterators require *some* unsafe code *somewhere*, though.
|
||||
Whether it's raw pointers, or safely composing on top of *another* IterMut.
|
||||
|
||||
For instance, VecDeque's IterMut:
|
||||
|
||||
```rust
|
||||
pub struct IterMut<'a, T:'a> {
|
||||
// The whole backing array. Some of these indices are initialized!
|
||||
ring: &'a mut [T],
|
||||
tail: usize,
|
||||
head: usize,
|
||||
}
|
||||
|
||||
impl<'a, T> Iterator for IterMut<'a, T> {
|
||||
type Item = &'a mut T;
|
||||
|
||||
fn next(&mut self) -> Option<&'a mut T> {
|
||||
if self.tail == self.head {
|
||||
return None;
|
||||
}
|
||||
let tail = self.tail;
|
||||
self.tail = wrap_index(self.tail.wrapping_add(1), self.ring.len());
|
||||
|
||||
unsafe {
|
||||
// might as well do unchecked indexing since wrap_index has us
|
||||
// in-bounds, and many of the "middle" indices are uninitialized
|
||||
// anyway.
|
||||
let elem = self.ring.get_unchecked_mut(tail);
|
||||
|
||||
// round-trip through a raw pointer to unbound the lifetime from
|
||||
// ourselves
|
||||
Some(&mut *(elem as *mut _))
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
A very subtle but interesting detail in this design is that it *relies on
|
||||
privacy to be sound*. Borrowck works on some very simple rules. One of those rules
|
||||
is that if we have a live &mut Foo and Foo contains an &mut Bar, then that &mut
|
||||
Bar is *also* live. Since IterMut is always live when `next` can be called, if
|
||||
`ring` were public then we could mutate `ring` while outstanding mutable borrows
|
||||
to it exist!
|
@ -0,0 +1,81 @@
|
||||
% Limits of Lifetimes
|
||||
|
||||
Given the following code:
|
||||
|
||||
```rust,ignore
|
||||
struct Foo;
|
||||
|
||||
impl Foo {
|
||||
fn mutate_and_share(&mut self) -> &Self { &*self }
|
||||
fn share(&self) {}
|
||||
}
|
||||
|
||||
fn main() {
|
||||
let mut foo = Foo;
|
||||
let loan = foo.mutate_and_share();
|
||||
foo.share();
|
||||
}
|
||||
```
|
||||
|
||||
One might expect it to compile. We call `mutate_and_share`, which mutably borrows
|
||||
`foo` *temporarily*, but then returns *only* a shared reference. Therefore we
|
||||
would expect `foo.share()` to succeed as `foo` shouldn't be mutably borrowed.
|
||||
|
||||
However when we try to compile it:
|
||||
|
||||
```text
|
||||
<anon>:11:5: 11:8 error: cannot borrow `foo` as immutable because it is also borrowed as mutable
|
||||
<anon>:11 foo.share();
|
||||
^~~
|
||||
<anon>:10:16: 10:19 note: previous borrow of `foo` occurs here; the mutable borrow prevents subsequent moves, borrows, or modification of `foo` until the borrow ends
|
||||
<anon>:10 let loan = foo.mutate_and_share();
|
||||
^~~
|
||||
<anon>:12:2: 12:2 note: previous borrow ends here
|
||||
<anon>:8 fn main() {
|
||||
<anon>:9 let mut foo = Foo;
|
||||
<anon>:10 let loan = foo.mutate_and_share();
|
||||
<anon>:11 foo.share();
|
||||
<anon>:12 }
|
||||
^
|
||||
```
|
||||
|
||||
What happened? Well, we got the exact same reasoning as we did for
|
||||
[Example 2 in the previous section][ex2]. We desugar the program and we get
|
||||
the following:
|
||||
|
||||
```rust,ignore
|
||||
struct Foo;
|
||||
|
||||
impl Foo {
|
||||
fn mutate_and_share<'a>(&'a mut self) -> &'a Self { &'a *self }
|
||||
fn share<'a>(&'a self) {}
|
||||
}
|
||||
|
||||
fn main() {
|
||||
'b: {
|
||||
let mut foo: Foo = Foo;
|
||||
'c: {
|
||||
let loan: &'c Foo = Foo::mutate_and_share::<'c>(&'c mut foo);
|
||||
'd: {
|
||||
Foo::share::<'d>(&'d foo);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
The lifetime system is forced to extend the `&mut foo` to have lifetime `'c`,
|
||||
due to the lifetime of `loan` and mutate_and_share's signature. Then when we
|
||||
try to call `share`, and it sees we're trying to alias that `&'c mut foo` and
|
||||
blows up in our face!
|
||||
|
||||
This program is clearly correct according to the reference semantics we *actually*
|
||||
care about, but the lifetime system is too coarse-grained to handle that.
|
||||
|
||||
|
||||
TODO: other common problems? SEME regions stuff, mostly?
|
||||
|
||||
|
||||
|
||||
|
||||
[ex2]: lifetimes.html#example-2:-aliasing-a-mutable-reference
|
File diff suppressed because it is too large
Load Diff
@ -0,0 +1,82 @@
|
||||
% Meet Safe and Unsafe
|
||||
|
||||
Safe and Unsafe are Rust's chief engineers.
|
||||
|
||||
TODO: ADORABLE PICTURES OMG
|
||||
|
||||
Unsafe handles all the dangerous internal stuff. They build the foundations
|
||||
and handle all the dangerous materials. By all accounts, Unsafe is really a bit
|
||||
unproductive, because the nature of their work means that they have to spend a
|
||||
lot of time checking and double-checking everything. What if there's an earthquake
|
||||
on a leap year? Are we ready for that? Unsafe better be, because if they get
|
||||
*anything* wrong, everything will blow up! What Unsafe brings to the table is
|
||||
*quality*, not quantity. Still, nothing would ever get done if everything was
|
||||
built to Unsafe's standards!
|
||||
|
||||
That's where Safe comes in. Safe has to handle *everything else*. Since Safe needs
|
||||
to *get work done*, they've grown to be fairly carless and clumsy! Safe doesn't worry
|
||||
about all the crazy eventualities that Unsafe does, because life is too short to deal
|
||||
with leap-year-earthquakes. Of course, this means there's some jobs that Safe just
|
||||
can't handle. Safe is all about quantity over quality.
|
||||
|
||||
Unsafe loves Safe to bits, but knows that tey *can never trust them to do the
|
||||
right thing*. Still, Unsafe acknowledges that not every problem needs quite the
|
||||
attention to detail that they apply. Indeed, Unsafe would *love* if Safe could do
|
||||
*everything* for them. To accomplish this, Unsafe spends most of their time
|
||||
building *safe abstractions*. These abstractions handle all the nitty-gritty
|
||||
details for Safe, and choose good defaults so that the simplest solution (which
|
||||
Safe will inevitably use) is usually the *right* one. Once a safe abstraction is
|
||||
built, Unsafe ideally needs to never work on it again, and Safe can blindly use
|
||||
it in all their work.
|
||||
|
||||
Unsafe's attention to detail means that all the things that they mark as ok for
|
||||
Safe to use can be combined in arbitrarily ridiculous ways, and all the rules
|
||||
that Unsafe is forced to uphold will never be violated. If they *can* be violated
|
||||
by Safe, that means *Unsafe*'s the one in the wrong. Safe can work carelessly,
|
||||
knowing that if anything blows up, it's not *their* fault. Safe can also call in
|
||||
Unsafe at any time if there's a hard problem they can't quite work out, or if they
|
||||
can't meet the client's quality demands. Of course, Unsafe will beg and plead Safe
|
||||
to try their latest safe abstraction first!
|
||||
|
||||
In addition to being adorable, Safe and Unsafe are what makes Rust possible.
|
||||
Rust can be thought of as two different languages: Safe Rust, and Unsafe Rust.
|
||||
Any time someone opines the guarantees of Rust, they are almost surely talking about
|
||||
Safe. However Safe is not sufficient to write every program. For that,
|
||||
we need the Unsafe superset.
|
||||
|
||||
Most fundamentally, writing bindings to other languages
|
||||
(such as the C exposed by your operating system) is never going to be safe. Rust
|
||||
can't control what other languages do to program execution! However Unsafe is
|
||||
also necessary to construct fundamental abstractions where the type system is not
|
||||
sufficient to automatically prove what you're doing is sound.
|
||||
|
||||
Indeed, the Rust standard library is implemented in Rust, and it makes substantial
|
||||
use of Unsafe for implementing IO, memory allocation, collections,
|
||||
synchronization, and other low-level computational primitives.
|
||||
|
||||
Upon hearing this, many wonder why they would not simply just use C or C++ in place of
|
||||
Rust (or just use a "real" safe language). If we're going to do unsafe things, why not
|
||||
lean on these much more established languages?
|
||||
|
||||
The most important difference between C++ and Rust is a matter of defaults:
|
||||
Rust is 100% safe by default. Even when you *opt out* of safety in Rust, it is a modular
|
||||
action. In deciding to work with unchecked uninitialized memory, this does not
|
||||
suddenly make dangling or null pointers a problem. When using unchecked indexing on `x`,
|
||||
one does not have to suddenly worry about indexing out of bounds on `y`.
|
||||
C and C++, by contrast, have pervasive unsafety baked into the language. Even the
|
||||
modern best practices like `unique_ptr` have various safety pitfalls.
|
||||
|
||||
It cannot be emphasized enough that Unsafe should be regarded as an exceptional
|
||||
thing, not a normal one. Unsafe is often the domain of *fundamental libraries*: anything that needs
|
||||
to make FFI bindings or define core abstractions. These fundamental libraries then expose
|
||||
a safe interface for intermediate libraries and applications to build upon. And these
|
||||
safe interfaces make an important promise: if your application segfaults, it's not your
|
||||
fault. *They* have a bug.
|
||||
|
||||
And really, how is that different from *any* safe language? Python, Ruby, and Java libraries
|
||||
can internally do all sorts of nasty things. The languages themselves are no
|
||||
different. Safe languages *regularly* have bugs that cause critical vulnerabilities.
|
||||
The fact that Rust is written with a healthy spoonful of Unsafe is no different.
|
||||
However it *does* mean that Rust doesn't need to fall back to the pervasive unsafety of
|
||||
C to do the nasty things that need to get done.
|
||||
|
@ -0,0 +1,61 @@
|
||||
% Alternative representations
|
||||
|
||||
Rust allows you to specify alternative data layout strategies from the default.
|
||||
|
||||
|
||||
|
||||
|
||||
# repr(C)
|
||||
|
||||
This is the most important `repr`. It has fairly simple intent: do what C does.
|
||||
The order, size, and alignment of fields is exactly what you would expect from
|
||||
C or C++. Any type you expect to pass through an FFI boundary should have `repr(C)`,
|
||||
as C is the lingua-franca of the programming world. This is also necessary
|
||||
to soundly do more elaborate tricks with data layout such as reintepretting values
|
||||
as a different type.
|
||||
|
||||
However, the interaction with Rust's more exotic data layout features must be kept
|
||||
in mind. Due to its dual purpose as "for FFI" and "for layout control", `repr(C)`
|
||||
can be applied to types that will be nonsensical or problematic if passed through
|
||||
the FFI boundary.
|
||||
|
||||
* ZSTs are still zero-sized, even though this is not a standard behaviour
|
||||
in C, and is explicitly contrary to the behaviour of an empty type in C++, which
|
||||
still consumes a byte of space.
|
||||
|
||||
* DSTs, tuples, and tagged unions are not a concept in C and as such are never
|
||||
FFI safe.
|
||||
|
||||
* **The drop flag will still be added**
|
||||
|
||||
* This is equivalent to `repr(u32)` for enums (see below)
|
||||
|
||||
|
||||
|
||||
|
||||
# repr(packed)
|
||||
|
||||
`repr(packed)` forces rust to strip any padding, and only align the type to a
|
||||
byte. This may improve the memory footprint, but will likely have other
|
||||
negative side-effects.
|
||||
|
||||
In particular, most architectures *strongly* prefer values to be aligned. This
|
||||
may mean the unaligned loads are penalized (x86), or even fault (ARM). In
|
||||
particular, the compiler may have trouble with references to unaligned fields.
|
||||
|
||||
`repr(packed)` is not to be used lightly. Unless you have extreme requirements,
|
||||
this should not be used.
|
||||
|
||||
This repr is a modifier on `repr(C)` and `repr(rust)`.
|
||||
|
||||
|
||||
|
||||
|
||||
# repr(u8), repr(u16), repr(u32), repr(u64)
|
||||
|
||||
These specify the size to make a C-like enum. If the discriminant overflows the
|
||||
integer it has to fit in, it will be an error. You can manually ask Rust to
|
||||
allow this by setting the overflowing element to explicitly be 0. However Rust
|
||||
will not allow you to create an enum where two variants.
|
||||
|
||||
These reprs have no affect on a struct or non-C-like enum.
|
@ -0,0 +1,66 @@
|
||||
% Ownership and Lifetimes
|
||||
|
||||
Ownership is the breakout feature of Rust. It allows Rust to be completely
|
||||
memory-safe and efficient, while avoiding garbage collection. Before getting
|
||||
into the ownership system in detail, we will consider the motivation of this
|
||||
design.
|
||||
|
||||
We will assume that you accept that garbage collection is not always an optimal
|
||||
solution, and that it is desirable to manually manage memory to some extent.
|
||||
If you do not accept this, might I interest you in a different language?
|
||||
|
||||
Regardless of your feelings on GC, it is pretty clearly a *massive* boon to
|
||||
making code safe. You never have to worry about things going away *too soon*
|
||||
(although whether you still *wanted* to be pointing at that thing is a different
|
||||
issue...). This is a pervasive problem that C and C++ need to deal with.
|
||||
Consider this simple mistake that all of us who have used a non-GC'd language
|
||||
have made at one point:
|
||||
|
||||
```rust
|
||||
fn as_str(data: &u32) -> &str {
|
||||
// compute the string
|
||||
let s = format!("{}", data);
|
||||
|
||||
// OH NO! We returned a reference to something that
|
||||
// exists only in this function!
|
||||
// Dangling pointer! Use after free! Alas!
|
||||
// (this does not compile in Rust)
|
||||
&s
|
||||
}
|
||||
```
|
||||
|
||||
This is exactly what Rust's ownership system was built to solve.
|
||||
Rust knows the scope in which the `&s` lives, and as such can prevent it from
|
||||
escaping. However this is a simple case that even a C compiler could plausibly
|
||||
catch. Things get more complicated as code gets bigger and pointers get fed through
|
||||
various functions. Eventually, a C compiler will fall down and won't be able to
|
||||
perform sufficient escape analysis to prove your code unsound. It will consequently
|
||||
be forced to accept your program on the assumption that it is correct.
|
||||
|
||||
This will never happen to Rust. It's up to the programmer to prove to the
|
||||
compiler that everything is sound.
|
||||
|
||||
Of course, rust's story around ownership is much more complicated than just
|
||||
verifying that references don't escape the scope of their referent. That's
|
||||
because ensuring pointers are always valid is much more complicated than this.
|
||||
For instance in this code,
|
||||
|
||||
```rust
|
||||
let mut data = vec![1, 2, 3];
|
||||
// get an internal reference
|
||||
let x = &data[0];
|
||||
|
||||
// OH NO! `push` causes the backing storage of `data` to be reallocated.
|
||||
// Dangling pointer! User after free! Alas!
|
||||
// (this does not compile in Rust)
|
||||
data.push(4);
|
||||
|
||||
println!("{}", x);
|
||||
```
|
||||
|
||||
naive scope analysis would be insufficient to prevent this bug, because `data`
|
||||
does in fact live as long as we needed. However it was *changed* while we had
|
||||
a reference into it. This is why Rust requires any references to freeze the
|
||||
referent and its owners.
|
||||
|
||||
|
@ -0,0 +1,139 @@
|
||||
% References
|
||||
|
||||
There are two kinds of reference:
|
||||
|
||||
* Shared reference: `&`
|
||||
* Mutable reference: `&mut`
|
||||
|
||||
Which obey the following rules:
|
||||
|
||||
* A reference cannot outlive its referent
|
||||
* A mutable reference cannot be aliased
|
||||
|
||||
To define aliasing, we must define the notion of *paths* and *liveness*.
|
||||
|
||||
|
||||
|
||||
|
||||
# Paths
|
||||
|
||||
If all Rust had were values, then every value would be uniquely owned
|
||||
by a variable or composite structure. From this we naturally derive a *tree*
|
||||
of ownership. The stack itself is the root of the tree, with every variable
|
||||
as its direct children. Each variable's direct children would be their fields
|
||||
(if any), and so on.
|
||||
|
||||
From this view, every value in Rust has a unique *path* in the tree of ownership.
|
||||
References to a value can subsequently be interpreted as a path in this tree.
|
||||
Of particular interest are *prefixes*: `x` is a prefix of `y` if `x` owns `y`
|
||||
|
||||
However much data doesn't reside on the stack, and we must also accommodate this.
|
||||
Globals and thread-locals are simple enough to model as residing at the bottom
|
||||
of the stack (though we must be careful with mutable globals). Data on
|
||||
the heap poses a different problem.
|
||||
|
||||
If all Rust had on the heap was data uniquely by a pointer on the stack,
|
||||
then we can just treat that pointer as a struct that owns the value on
|
||||
the heap. Box, Vec, String, and HashMap, are examples of types which uniquely
|
||||
own data on the heap.
|
||||
|
||||
Unfortunately, data on the heap is not *always* uniquely owned. Rc for instance
|
||||
introduces a notion of *shared* ownership. Shared ownership means there is no
|
||||
unique path. A value with no unique path limits what we can do with it. In general, only
|
||||
shared references can be created to these values. However mechanisms which ensure
|
||||
mutual exclusion may establish One True Owner temporarily, establishing a unique path
|
||||
to that value (and therefore all its children).
|
||||
|
||||
The most common way to establish such a path is through *interior mutability*,
|
||||
in contrast to the *inherited mutability* that everything in Rust normally uses.
|
||||
Cell, RefCell, Mutex, and RWLock are all examples of interior mutability types. These
|
||||
types provide exclusive access through runtime restrictions. However it is also
|
||||
possible to establish unique ownership without interior mutability. For instance,
|
||||
if an Rc has refcount 1, then it is safe to mutate or move its internals.
|
||||
|
||||
|
||||
|
||||
|
||||
# Liveness
|
||||
|
||||
Roughly, a reference is *live* at some point in a program if it can be
|
||||
dereferenced. Shared references are always live unless they are literally unreachable
|
||||
(for instance, they reside in freed or leaked memory). Mutable references can be
|
||||
reachable but *not* live through the process of *reborrowing*.
|
||||
|
||||
A mutable reference can be reborrowed to either a shared or mutable reference.
|
||||
Further, the reborrow can produce exactly the same reference, or point to a
|
||||
path it is a prefix of. For instance, a mutable reference can be reborrowed
|
||||
to point to a field of its referent:
|
||||
|
||||
```rust
|
||||
let x = &mut (1, 2);
|
||||
{
|
||||
// reborrow x to a subfield
|
||||
let y = &mut x.0;
|
||||
// y is now live, but x isn't
|
||||
*y = 3;
|
||||
}
|
||||
// y goes out of scope, so x is live again
|
||||
*x = (5, 7);
|
||||
```
|
||||
|
||||
It is also possible to reborrow into *multiple* mutable references, as long as
|
||||
they are *disjoint*: no reference is a prefix of another. Rust
|
||||
explicitly enables this to be done with disjoint struct fields, because
|
||||
disjointness can be statically proven:
|
||||
|
||||
```rust
|
||||
let x = &mut (1, 2);
|
||||
{
|
||||
// reborrow x to two disjoint subfields
|
||||
let y = &mut x.0;
|
||||
let z = &mut x.1;
|
||||
// y and z are now live, but x isn't
|
||||
*y = 3;
|
||||
*z = 4;
|
||||
}
|
||||
// y and z go out of scope, so x is live again
|
||||
*x = (5, 7);
|
||||
```
|
||||
|
||||
However it's often the case that Rust isn't sufficiently smart to prove that
|
||||
multiple borrows are disjoint. *This does not mean it is fundamentally illegal
|
||||
to make such a borrow*, just that Rust isn't as smart as you want.
|
||||
|
||||
To simplify things, we can model variables as a fake type of reference: *owned*
|
||||
references. Owned references have much the same semantics as mutable references:
|
||||
they can be re-borrowed in a mutable or shared manner, which makes them no longer
|
||||
live. Live owned references have the unique property that they can be moved
|
||||
out of (though mutable references *can* be swapped out of). This is
|
||||
only given to *live* owned references because moving its referent would of
|
||||
course invalidate all outstanding references prematurely.
|
||||
|
||||
As a local lint against inappropriate mutation, only variables that are marked
|
||||
as `mut` can be borrowed mutably.
|
||||
|
||||
It is also interesting to note that Box behaves exactly like an owned
|
||||
reference. It can be moved out of, and Rust understands it sufficiently to
|
||||
reason about its paths like a normal variable.
|
||||
|
||||
|
||||
|
||||
|
||||
# Aliasing
|
||||
|
||||
With liveness and paths defined, we can now properly define *aliasing*:
|
||||
|
||||
**A mutable reference is aliased if there exists another live reference to it or
|
||||
one of its prefixes.**
|
||||
|
||||
That's it. Super simple right? Except for the fact that it took us two pages
|
||||
to define all of the terms in that defintion. You know: Super. Simple.
|
||||
|
||||
Actually it's a bit more complicated than that. In addition to references,
|
||||
Rust has *raw pointers*: `*const T` and `*mut T`. Raw pointers have no inherent
|
||||
ownership or aliasing semantics. As a result, Rust makes absolutely no effort
|
||||
to track that they are used correctly, and they are wildly unsafe.
|
||||
|
||||
**It is an open question to what degree raw pointers have alias semantics.
|
||||
However it is important for these definitions to be sound that the existence
|
||||
of a raw pointer does not imply some kind of live path.**
|
@ -0,0 +1,124 @@
|
||||
% repr(Rust)
|
||||
|
||||
Rust gives you the following ways to lay out composite data:
|
||||
|
||||
* structs (named product types)
|
||||
* tuples (anonymous product types)
|
||||
* arrays (homogeneous product types)
|
||||
* enums (named sum types -- tagged unions)
|
||||
|
||||
An enum is said to be *C-like* if none of its variants have associated data.
|
||||
|
||||
For all these, individual fields are aligned to their preferred alignment. For
|
||||
primitives this is usually equal to their size. For instance, a u32 will be
|
||||
aligned to a multiple of 32 bits, and a u16 will be aligned to a multiple of 16
|
||||
bits. Composite structures will have a preferred alignment equal to the maximum
|
||||
of their fields' preferred alignment, and a size equal to a multiple of their
|
||||
preferred alignment. This ensures that arrays of T can be correctly iterated
|
||||
by offsetting by their size. So for instance,
|
||||
|
||||
```rust
|
||||
struct A {
|
||||
a: u8,
|
||||
c: u32,
|
||||
b: u16,
|
||||
}
|
||||
```
|
||||
|
||||
will have a size that is a multiple of 32-bits, and 32-bit alignment.
|
||||
|
||||
There is *no indirection* for these types; all data is stored contiguously as you would
|
||||
expect in C. However with the exception of arrays (which are densely packed and
|
||||
in-order), the layout of data is not by default specified in Rust. Given the two
|
||||
following struct definitions:
|
||||
|
||||
```rust
|
||||
struct A {
|
||||
a: i32,
|
||||
b: u64,
|
||||
}
|
||||
|
||||
struct B {
|
||||
x: i32,
|
||||
b: u64,
|
||||
}
|
||||
```
|
||||
|
||||
Rust *does* guarantee that two instances of A have their data laid out in exactly
|
||||
the same way. However Rust *does not* guarantee that an instance of A has the same
|
||||
field ordering or padding as an instance of B (in practice there's no *particular*
|
||||
reason why they wouldn't, other than that its not currently guaranteed).
|
||||
|
||||
With A and B as written, this is basically nonsensical, but several other features
|
||||
of Rust make it desirable for the language to play with data layout in complex ways.
|
||||
|
||||
For instance, consider this struct:
|
||||
|
||||
```rust
|
||||
struct Foo<T, U> {
|
||||
count: u16,
|
||||
data1: T,
|
||||
data2: U,
|
||||
}
|
||||
```
|
||||
|
||||
Now consider the monomorphizations of `Foo<u32, u16>` and `Foo<u16, u32>`. If Rust lays out the
|
||||
fields in the order specified, we expect it to *pad* the values in the struct to satisfy
|
||||
their *alignment* requirements. So if Rust didn't reorder fields, we would expect Rust to
|
||||
produce the following:
|
||||
|
||||
```rust
|
||||
struct Foo<u16, u32> {
|
||||
count: u16,
|
||||
data1: u16,
|
||||
data2: u32,
|
||||
}
|
||||
|
||||
struct Foo<u32, u16> {
|
||||
count: u16,
|
||||
_pad1: u16,
|
||||
data1: u32,
|
||||
data2: u16,
|
||||
_pad2: u16,
|
||||
}
|
||||
```
|
||||
|
||||
The latter case quite simply wastes space. An optimal use of space therefore requires
|
||||
different monomorphizations to have *different field orderings*.
|
||||
|
||||
**Note: this is a hypothetical optimization that is not yet implemented in Rust 1.0**
|
||||
|
||||
Enums make this consideration even more complicated. Naively, an enum such as:
|
||||
|
||||
```rust
|
||||
enum Foo {
|
||||
A(u32),
|
||||
B(u64),
|
||||
C(u8),
|
||||
}
|
||||
```
|
||||
|
||||
would be laid out as:
|
||||
|
||||
```rust
|
||||
struct FooRepr {
|
||||
data: u64, // this is *really* either a u64, u32, or u8 based on `tag`
|
||||
tag: u8, // 0 = A, 1 = B, 2 = C
|
||||
}
|
||||
```
|
||||
|
||||
And indeed this is approximately how it would be laid out in general
|
||||
(modulo the size and position of `tag`). However there are several cases where
|
||||
such a representation is ineffiecient. The classic case of this is Rust's
|
||||
"null pointer optimization". Given a pointer that is known to not be null
|
||||
(e.g. `&u32`), an enum can *store* a discriminant bit *inside* the pointer
|
||||
by using null as a special value. The net result is that
|
||||
`size_of::<Option<&T>>() == size_of::<&T>()`
|
||||
|
||||
There are many types in Rust that are, or contain, "not null" pointers such as
|
||||
`Box<T>`, `Vec<T>`, `String`, `&T`, and `&mut T`. Similarly, one can imagine
|
||||
nested enums pooling their tags into a single descriminant, as they are by
|
||||
definition known to have a limited range of valid values. In principle enums can
|
||||
use fairly elaborate algorithms to cache bits throughout nested types with
|
||||
special constrained representations. As such it is *especially* desirable that
|
||||
we leave enum layout unspecified today.
|
@ -0,0 +1,135 @@
|
||||
% What do Safe and Unsafe really mean?
|
||||
|
||||
Rust cares about preventing the following things:
|
||||
|
||||
* Dereferencing null or dangling pointers
|
||||
* Reading [uninitialized memory][]
|
||||
* Breaking the [pointer aliasing rules][]
|
||||
* Producing invalid primitive values:
|
||||
* dangling/null references
|
||||
* a `bool` that isn't 0 or 1
|
||||
* an undefined `enum` discriminant
|
||||
* a `char` larger than char::MAX (TODO: check if stronger restrictions apply)
|
||||
* A non-utf8 `str`
|
||||
* Unwinding into another language
|
||||
* Causing a [data race][]
|
||||
* Invoking Misc. Undefined Behaviour (in e.g. compiler intrinsics)
|
||||
|
||||
That's it. That's all the Undefined Behaviour in Rust. Libraries are free to
|
||||
declare arbitrary requirements if they could transitively cause memory safety
|
||||
issues, but it all boils down to the above actions. Rust is otherwise
|
||||
quite permisive with respect to other dubious operations. Rust considers it
|
||||
"safe" to:
|
||||
|
||||
* Deadlock
|
||||
* Have a Race Condition
|
||||
* Leak memory
|
||||
* Fail to call destructors
|
||||
* Overflow integers
|
||||
* Delete the production database
|
||||
|
||||
However any program that does such a thing is *probably* incorrect. Rust
|
||||
provides lots of tools to make doing these things rare, but these problems are
|
||||
considered impractical to categorically prevent.
|
||||
|
||||
Rust models the seperation between Safe and Unsafe with the `unsafe` keyword.
|
||||
There are several places `unsafe` can appear in Rust today, which can largely be
|
||||
grouped into two categories:
|
||||
|
||||
* There are unchecked contracts here. To declare you understand this, I require
|
||||
you to write `unsafe` elsewhere:
|
||||
* On functions, `unsafe` is declaring the function to be unsafe to call. Users
|
||||
of the function must check the documentation to determine what this means,
|
||||
and then have to write `unsafe` somewhere to identify that they're aware of
|
||||
the danger.
|
||||
* On trait declarations, `unsafe` is declaring that *implementing* the trait
|
||||
is an unsafe operation, as it has contracts that other unsafe code is free to
|
||||
trust blindly.
|
||||
|
||||
* I am declaring that I have, to the best of my knowledge, adhered to the
|
||||
unchecked contracts:
|
||||
* On trait implementations, `unsafe` is declaring that the contract of the
|
||||
`unsafe` trait has been upheld.
|
||||
* On blocks, `unsafe` is declaring any unsafety from an unsafe
|
||||
operation within to be handled, and therefore the parent function is safe.
|
||||
|
||||
There is also `#[unsafe_no_drop_flag]`, which is a special case that exists for
|
||||
historical reasons and is in the process of being phased out. See the section on
|
||||
[destructors][] for details.
|
||||
|
||||
Some examples of unsafe functions:
|
||||
|
||||
* `slice::get_unchecked` will perform unchecked indexing, allowing memory
|
||||
safety to be freely violated.
|
||||
* `ptr::offset` is an intrinsic that invokes Undefined Behaviour if it is
|
||||
not "in bounds" as defined by LLVM (see the lifetimes section for details).
|
||||
* `mem::transmute` reinterprets some value as having the given type,
|
||||
bypassing type safety in arbitrary ways. (see [conversions][] for details)
|
||||
* All FFI functions are `unsafe` because they can do arbitrary things.
|
||||
C being an obvious culprit, but generally any language can do something
|
||||
that Rust isn't happy about.
|
||||
|
||||
As of Rust 1.0 there are exactly two unsafe traits:
|
||||
|
||||
* `Send` is a marker trait (it has no actual API) that promises implementors
|
||||
are safe to send to another thread.
|
||||
* `Sync` is a marker trait that promises that threads can safely share
|
||||
implementors through a shared reference.
|
||||
|
||||
The need for unsafe traits boils down to the fundamental lack of trust that Unsafe
|
||||
has for Safe. All safe traits are free to declare arbitrary contracts, but because
|
||||
implementing them is a job for Safe, Unsafe can't trust those contracts to actually
|
||||
be upheld.
|
||||
|
||||
For instance Rust has `PartialOrd` and `Ord` traits to try to differentiate
|
||||
between types which can "just" be compared, and those that actually implement a
|
||||
*total* ordering. Pretty much every API that wants to work with data that can be
|
||||
compared *really* wants Ord data. For instance, a sorted map like BTreeMap
|
||||
*doesn't even make sense* for partially ordered types. If you claim to implement
|
||||
Ord for a type, but don't actually provide a proper total ordering, BTreeMap will
|
||||
get *really confused* and start making a total mess of itself. Data that is
|
||||
inserted may be impossible to find!
|
||||
|
||||
But that's ok. BTreeMap is safe, so it guarantees that even if you give it a
|
||||
*completely* garbage Ord implementation, it will still do something *safe*. You
|
||||
won't start reading uninitialized memory or unallocated memory. In fact, BTreeMap
|
||||
manages to not actually lose any of your data. When the map is dropped, all the
|
||||
destructors will be successfully called! Hooray!
|
||||
|
||||
However BTreeMap is implemented using a modest spoonful of Unsafe (most collections
|
||||
are). That means that it is not necessarily *trivially true* that a bad Ord
|
||||
implementation will make BTreeMap behave safely. Unsafe most be sure not to rely
|
||||
on Ord *where safety is at stake*, because Ord is provided by Safe, and memory
|
||||
safety is not Safe's responsibility to uphold. *It must be impossible for Safe
|
||||
code to violate memory safety*.
|
||||
|
||||
But wouldn't it be grand if there was some way for Unsafe to trust *some* trait
|
||||
contracts *somewhere*? This is the problem that unsafe traits tackle: by marking
|
||||
*the trait itself* as unsafe *to implement*, Unsafe can trust the implementation
|
||||
to be correct (because Unsafe can trust themself).
|
||||
|
||||
Rust has traditionally avoided making traits unsafe because it makes Unsafe
|
||||
pervasive, which is not desirable. Send and Sync are unsafe is because
|
||||
thread safety is a *fundamental property* that Unsafe cannot possibly hope to
|
||||
defend against in the same way it would defend against a bad Ord implementation.
|
||||
The only way to possibly defend against thread-unsafety would be to *not use
|
||||
threading at all*. Making every operation atomic isn't even sufficient, because
|
||||
it's possible for complex invariants between disjoint locations in memory.
|
||||
|
||||
Even concurrent paradigms that are traditionally regarded as Totally Safe like
|
||||
message passing implicitly rely on some notion of thread safety -- are you
|
||||
really message-passing if you send a *pointer*? Send and Sync therefore require
|
||||
some *fundamental* level of trust that Safe code can't provide, so they must be
|
||||
unsafe to implement. To help obviate the pervasive unsafety that this would
|
||||
introduce, Send (resp. Sync) is *automatically* derived for all types composed only
|
||||
of Send (resp. Sync) values. 99% of types are Send and Sync, and 99% of those
|
||||
never actually say it (the remaining 1% is overwhelmingly synchronization
|
||||
primitives).
|
||||
|
||||
|
||||
|
||||
[pointer aliasing rules]: lifetimes.html#references
|
||||
[uninitialized memory]: uninitialized.html
|
||||
[data race]: concurrency.html
|
||||
[destructors]: raii.html
|
||||
[conversions]: conversions.html
|
@ -0,0 +1,177 @@
|
||||
% Subtyping and Variance
|
||||
|
||||
Although Rust doesn't have any notion of inheritance, it *does* include subtyping.
|
||||
In Rust, subtyping derives entirely from *lifetimes*. Since lifetimes are scopes,
|
||||
we can partially order them based on a *contains* (outlives) relationship. We
|
||||
can even express this as a generic bound: `T: 'a` specifies that whatever scope `T`
|
||||
is valid for must contain the scope `'a` ("T outlives `'a`").
|
||||
|
||||
We can then define subtyping on lifetimes in terms of that relationship: if `'a: 'b`
|
||||
("a contains b" or "a outlives b"), then `'a` is a subtype of `'b`. This is a
|
||||
large source of confusion, because it seems intuitively backwards to many:
|
||||
the bigger scope is a *sub type* of the smaller scope.
|
||||
|
||||
This does in fact make sense. The intuitive reason for this is that if you expect an
|
||||
`&'a u8`, then it's totally fine for me to hand you an `&'static u8`, in the same way
|
||||
that if you expect an Animal in Java, it's totally fine for me to hand you a Cat.
|
||||
Cats are just Animals *and more*, just as `'static` is just `'a` *and more*.
|
||||
|
||||
(Note, the subtyping relationship and typed-ness of lifetimes is a fairly arbitrary
|
||||
construct that some disagree with. I just find that it simplifies this analysis.)
|
||||
|
||||
Higher-ranked lifetimes are also subtypes of every concrete lifetime. This is because
|
||||
taking an arbitrary lifetime is strictly more general than taking a specific one.
|
||||
|
||||
|
||||
|
||||
# Variance
|
||||
|
||||
Variance is where things get really harsh.
|
||||
|
||||
Variance is a property that *type constructors* have. A type constructor in Rust
|
||||
is a generic type with unbound arguments. For instance `Vec` is a type constructor
|
||||
that takes a `T` and returns a `Vec<T>`. `&` and `&mut` are type constructors that
|
||||
take a lifetime and a type.
|
||||
|
||||
A type constructor's *variance* is how the subtypes of its inputs affects the
|
||||
subtypes of its outputs. There are three kinds of variance:
|
||||
|
||||
* F is *variant* if `T` being a subtype of `U` implies `F<T>` is a subtype of `F<U>`
|
||||
* F is *invariant* otherwise (no subtyping relation can be derived)
|
||||
|
||||
(For those of you who are familiar with variance from other languages, what we refer
|
||||
to as "just" variance is in fact *covariance*. Rust does not have contravariance.
|
||||
Historically Rust did have some contravariance but it was scrapped due to poor
|
||||
interactions with other features.)
|
||||
|
||||
Some important variances:
|
||||
|
||||
* `&` is variant (as is `*const` by metaphor)
|
||||
* `&mut` is invariant (as is `*mut` by metaphor)
|
||||
* `Fn(T) -> U` is invariant with respect to `T`, but variant with respect to `U`
|
||||
* `Box`, `Vec`, and all other collections are variant
|
||||
* `UnsafeCell`, `Cell`, `RefCell`, `Mutex` and all "interior mutability"
|
||||
types are invariant
|
||||
|
||||
To understand why these variances are correct and desirable, we will consider several
|
||||
examples. We have already covered why `&` should be variant when introducing subtyping:
|
||||
it's desirable to be able to pass longer-lived things where shorter-lived things are
|
||||
needed.
|
||||
|
||||
To see why `&mut` should be invariant, consider the following code:
|
||||
|
||||
```rust
|
||||
fn main() {
|
||||
let mut forever_str: &'static str = "hello";
|
||||
{
|
||||
let string = String::from("world");
|
||||
overwrite(&mut forever_str, &mut &*string);
|
||||
}
|
||||
println!("{}", forever_str);
|
||||
}
|
||||
|
||||
fn overwrite<T: Copy>(input: &mut T, new: &mut T) {
|
||||
*input = *new;
|
||||
}
|
||||
```
|
||||
|
||||
The signature of `overwrite` is clearly valid: it takes mutable references to two values
|
||||
of the same type, and overwrites one with the other. We have seen already that `&` is
|
||||
variant, and `'static` is a subtype of *any* `'a`, so `&'static str` is a
|
||||
subtype of `&'a str`. Therefore, if `&mut` was
|
||||
*also* variant, then the lifetime of the `&'static str` would successfully be
|
||||
"shrunk" down to the shorter lifetime of the string, and `overwrite` would be
|
||||
called successfully. The string would subsequently be dropped, and `forever_str`
|
||||
would point to freed memory when we print it!
|
||||
|
||||
Therefore `&mut` should be invariant. This is the general theme of variance vs
|
||||
invariance: if variance would allow you to *store* a short-lived value in a
|
||||
longer-lived slot, then you must be invariant.
|
||||
|
||||
`Box` and `Vec` are interesting cases because they're variant, but you can
|
||||
definitely store values in them! This is fine because *you can only store values
|
||||
in them through a mutable reference*! The mutable reference makes the whole type
|
||||
invariant, and therefore prevents you from getting in trouble.
|
||||
|
||||
Being variant allows them to be variant when shared immutably (so you can pass
|
||||
a `&Box<&'static str>` where a `&Box<&'a str>` is expected). It also allows you to
|
||||
forever weaken the type by moving it into a weaker slot. That is, you can do:
|
||||
|
||||
```rust
|
||||
fn get_box<'a>(&'a u8) -> Box<&'a str> {
|
||||
// string literals are `&'static str`s
|
||||
Box::new("hello")
|
||||
}
|
||||
```
|
||||
|
||||
which is fine because unlike the mutable borrow case, there's no one else who
|
||||
"remembers" the old lifetime in the box.
|
||||
|
||||
The variance of the cell types similarly follows. `&` is like an `&mut` for a
|
||||
cell, because you can still store values in them through an `&`. Therefore cells
|
||||
must be invariant to avoid lifetime smuggling.
|
||||
|
||||
`Fn` is the most subtle case, because it has mixed variance. To see why
|
||||
`Fn(T) -> U` should be invariant over T, consider the following function
|
||||
signature:
|
||||
|
||||
```rust
|
||||
// 'a is derived from some parent scope
|
||||
fn foo(&'a str) -> usize;
|
||||
```
|
||||
|
||||
This signature claims that it can handle any &str that lives *at least* as long
|
||||
as `'a`. Now if this signature was variant with respect to `&str`, that would mean
|
||||
|
||||
```rust
|
||||
fn foo(&'static str) -> usize;
|
||||
```
|
||||
|
||||
could be provided in its place, as it would be a subtype. However this function
|
||||
has a *stronger* requirement: it says that it can *only* handle `&'static str`s,
|
||||
and nothing else. Therefore functions are not variant over their arguments.
|
||||
|
||||
To see why `Fn(T) -> U` should be *variant* over U, consider the following
|
||||
function signature:
|
||||
|
||||
```rust
|
||||
// 'a is derived from some parent scope
|
||||
fn foo(usize) -> &'a str;
|
||||
```
|
||||
|
||||
This signature claims that it will return something that outlives `'a`. It is
|
||||
therefore completely reasonable to provide
|
||||
|
||||
```rust
|
||||
fn foo(usize) -> &'static str;
|
||||
```
|
||||
|
||||
in its place. Therefore functions *are* variant over their return type.
|
||||
|
||||
`*const` has the exact same semantics as `&`, so variance follows. `*mut` on the
|
||||
other hand can dereference to an &mut whether shared or not, so it is marked
|
||||
as invariant in analogy to cells.
|
||||
|
||||
This is all well and good for the types the standard library provides, but
|
||||
how is variance determined for type that *you* define? A struct, informally
|
||||
speaking, inherits the variance of its fields. If a struct `Foo`
|
||||
has a generic argument `A` that is used in a field `a`, then Foo's variance
|
||||
over `A` is exactly `a`'s variance. However this is complicated if `A` is used
|
||||
in multiple fields.
|
||||
|
||||
* If all uses of A are variant, then Foo is variant over A
|
||||
* Otherwise, Foo is invariant over A
|
||||
|
||||
```rust
|
||||
struct Foo<'a, 'b, A, B, C, D, E, F, G, H> {
|
||||
a: &'a A, // variant over 'a and A
|
||||
b: &'b mut B, // invariant over 'b and B
|
||||
c: *const C, // variant over C
|
||||
d: *mut D, // invariant over D
|
||||
e: Vec<E>, // variant over E
|
||||
f: Cell<F>, // invariant over F
|
||||
g: G // variant over G
|
||||
h1: H // would also be variant over H except...
|
||||
h2: Cell<H> // invariant over H, because invariance wins
|
||||
}
|
||||
```
|
@ -0,0 +1,29 @@
|
||||
% Transmutes
|
||||
|
||||
Get out of our way type system! We're going to reinterpret these bits or die
|
||||
trying! Even though this book is all about doing things that are unsafe, I really
|
||||
can't emphasize that you should deeply think about finding Another Way than the
|
||||
operations covered in this section. This is really, truly, the most horribly
|
||||
unsafe thing you can do in Rust. The railguards here are dental floss.
|
||||
|
||||
`mem::transmute<T, U>` takes a value of type `T` and reinterprets it to have
|
||||
type `U`. The only restriction is that the `T` and `U` are verified to have the
|
||||
same size. The ways to cause Undefined Behaviour with this are mind boggling.
|
||||
|
||||
* First and foremost, creating an instance of *any* type with an invalid state
|
||||
is going to cause arbitrary chaos that can't really be predicted.
|
||||
* Transmute has an overloaded return type. If you do not specify the return type
|
||||
it may produce a surprising type to satisfy inference.
|
||||
* Making a primitive with an invalid value is UB
|
||||
* Transmuting between non-repr(C) types is UB
|
||||
* Transmuting an & to &mut is UB
|
||||
* Transmuting to a reference without an explicitly provided lifetime
|
||||
produces an [unbound lifetime](lifetimes.html#unbounded-lifetimes)
|
||||
|
||||
`mem::transmute_copy<T, U>` somehow manages to be *even more* wildly unsafe than
|
||||
this. It copies `size_of<U>` bytes out of an `&T` and interprets them as a `U`.
|
||||
The size check that `mem::transmute` has is gone (as it may be valid to copy
|
||||
out a prefix), though it is Undefined Behaviour for `U` to be larger than `T`.
|
||||
|
||||
Also of course you can get most of the functionality of these functions using
|
||||
pointer casts.
|
@ -0,0 +1,37 @@
|
||||
% Unbounded Lifetimes
|
||||
|
||||
Unsafe code can often end up producing references or lifetimes out of thin air.
|
||||
Such lifetimes come into the world as *unbounded*. The most common source of this
|
||||
is derefencing a raw pointer, which produces a reference with an unbounded lifetime.
|
||||
Such a lifetime becomes as big as context demands. This is in fact more powerful
|
||||
than simply becoming `'static`, because for instance `&'static &'a T`
|
||||
will fail to typecheck, but the unbound lifetime will perfectly mold into
|
||||
`&'a &'a T` as needed. However for most intents and purposes, such an unbounded
|
||||
lifetime can be regarded as `'static`.
|
||||
|
||||
Almost no reference is `'static`, so this is probably wrong. `transmute` and
|
||||
`transmute_copy` are the two other primary offenders. One should endeavour to
|
||||
bound an unbounded lifetime as quick as possible, especially across function
|
||||
boundaries.
|
||||
|
||||
Given a function, any output lifetimes that don't derive from inputs are
|
||||
unbounded. For instance:
|
||||
|
||||
```rust
|
||||
fn get_str<'a>() -> &'a str;
|
||||
```
|
||||
|
||||
will produce an `&str` with an unbounded lifetime. The easiest way to avoid
|
||||
unbounded lifetimes is to use lifetime elision at the function boundary.
|
||||
If an output lifetime is elided, then it *must* be bounded by an input lifetime.
|
||||
Of course it might be bounded by the *wrong* lifetime, but this will usually
|
||||
just cause a compiler error, rather than allow memory safety to be trivially
|
||||
violated.
|
||||
|
||||
Within a function, bounding lifetimes is more error-prone. The safest and easiest
|
||||
way to bound a lifetime is to return it from a function with a bound lifetime.
|
||||
However if this is unacceptable, the reference can be placed in a location with
|
||||
a specific lifetime. Unfortunately it's impossible to name all lifetimes involved
|
||||
in a function. To get around this, you can in principle use `copy_lifetime`, though
|
||||
these are unstable due to their awkward nature and questionable utility.
|
||||
|
@ -0,0 +1,86 @@
|
||||
% Unchecked Uninitialized Memory
|
||||
|
||||
One interesting exception to this rule is working with arrays. Safe Rust doesn't
|
||||
permit you to partially initialize an array. When you initialize an array, you
|
||||
can either set every value to the same thing with `let x = [val; N]`, or you can
|
||||
specify each member individually with `let x = [val1, val2, val3]`.
|
||||
Unfortunately this is pretty rigid, especially if you need to initialize your
|
||||
array in a more incremental or dynamic way.
|
||||
|
||||
Unsafe Rust gives us a powerful tool to handle this problem:
|
||||
`mem::uninitialized`. This function pretends to return a value when really
|
||||
it does nothing at all. Using it, we can convince Rust that we have initialized
|
||||
a variable, allowing us to do trickier things with conditional and incremental
|
||||
initialization.
|
||||
|
||||
Unfortunately, this opens us up to all kinds of problems. Assignment has a
|
||||
different meaning to Rust based on whether it believes that a variable is
|
||||
initialized or not. If it's uninitialized, then Rust will semantically just
|
||||
memcopy the bits over the uninitialized ones, and do nothing else. However if Rust
|
||||
believes a value to be initialized, it will try to `Drop` the old value!
|
||||
Since we've tricked Rust into believing that the value is initialized, we
|
||||
can no longer safely use normal assignment.
|
||||
|
||||
This is also a problem if you're working with a raw system allocator, which
|
||||
returns a pointer to uninitialized memory.
|
||||
|
||||
To handle this, we must use the `ptr` module. In particular, it provides
|
||||
three functions that allow us to assign bytes to a location in memory without
|
||||
evaluating the old value: `write`, `copy`, and `copy_nonoverlapping`.
|
||||
|
||||
* `ptr::write(ptr, val)` takes a `val` and moves it into the address pointed
|
||||
to by `ptr`.
|
||||
* `ptr::copy(src, dest, count)` copies the bits that `count` T's would occupy
|
||||
from src to dest. (this is equivalent to memmove -- note that the argument
|
||||
order is reversed!)
|
||||
* `ptr::copy_nonoverlapping(src, dest, count)` does what `copy` does, but a
|
||||
little faster on the assumption that the two ranges of memory don't overlap.
|
||||
(this is equivalent to memcopy -- note that the argument order is reversed!)
|
||||
|
||||
It should go without saying that these functions, if misused, will cause serious
|
||||
havoc or just straight up Undefined Behaviour. The only things that these
|
||||
functions *themselves* require is that the locations you want to read and write
|
||||
are allocated. However the ways writing arbitrary bits to arbitrary
|
||||
locations of memory can break things are basically uncountable!
|
||||
|
||||
Putting this all together, we get the following:
|
||||
|
||||
```rust
|
||||
fn main() {
|
||||
use std::mem;
|
||||
|
||||
// size of the array is hard-coded but easy to change. This means we can't
|
||||
// use [a, b, c] syntax to initialize the array, though!
|
||||
const SIZE = 10;
|
||||
|
||||
let x: [Box<u32>; SIZE];
|
||||
|
||||
unsafe {
|
||||
// convince Rust that x is Totally Initialized
|
||||
x = mem::uninitialized();
|
||||
for i in 0..SIZE {
|
||||
// very carefully overwrite each index without reading it
|
||||
// NOTE: exception safety is not a concern; Box can't panic
|
||||
ptr::write(&mut x[i], Box::new(i));
|
||||
}
|
||||
}
|
||||
|
||||
println!("{}", x);
|
||||
}
|
||||
```
|
||||
|
||||
It's worth noting that you don't need to worry about ptr::write-style
|
||||
shenanigans with types which don't implement Drop or
|
||||
contain Drop types, because Rust knows not to try to Drop them. Similarly you
|
||||
should be able to assign to fields of partially initialized structs
|
||||
directly if those fields don't contain any Drop types.
|
||||
|
||||
However when working with uninitialized memory you need to be ever-vigilant for
|
||||
Rust trying to Drop values you make like this before they're fully initialized.
|
||||
Every control path through that variable's scope must initialize the value
|
||||
before it ends, if has a destructor.
|
||||
*[This includes code panicking](unwinding.html)*.
|
||||
|
||||
And that's about it for working with uninitialized memory! Basically nothing
|
||||
anywhere expects to be handed uninitialized memory, so if you're going to pass
|
||||
it around at all, be sure to be *really* careful.
|
@ -0,0 +1,117 @@
|
||||
% Allocating Memory
|
||||
|
||||
So:
|
||||
|
||||
```rust
|
||||
#![feature(heap_api)]
|
||||
|
||||
use std::rt::heap::EMPTY;
|
||||
use std::mem;
|
||||
|
||||
impl<T> Vec<T> {
|
||||
fn new() -> Self {
|
||||
assert!(mem::size_of::<T>() != 0, "We're not ready to handle ZSTs");
|
||||
unsafe {
|
||||
// need to cast EMPTY to the actual ptr type we want, let
|
||||
// inference handle it.
|
||||
Vec { ptr: Unique::new(heap::EMPTY as *mut _), len: 0, cap: 0 }
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
I slipped in that assert there because zero-sized types will require some
|
||||
special handling throughout our code, and I want to defer the issue for now.
|
||||
Without this assert, some of our early drafts will do some Very Bad Things.
|
||||
|
||||
Next we need to figure out what to actually do when we *do* want space. For that,
|
||||
we'll need to use the rest of the heap APIs. These basically allow us to
|
||||
talk directly to Rust's instance of jemalloc.
|
||||
|
||||
We'll also need a way to handle out-of-memory conditions. The standard library
|
||||
calls the `abort` intrinsic, but calling intrinsics from normal Rust code is a
|
||||
pretty bad idea. Unfortunately, the `abort` exposed by the standard library
|
||||
allocates. Not something we want to do during `oom`! Instead, we'll call
|
||||
`std::process::exit`.
|
||||
|
||||
```rust
|
||||
fn oom() {
|
||||
::std::process::exit(-9999);
|
||||
}
|
||||
```
|
||||
|
||||
Okay, now we can write growing. Roughly, we want to have this logic:
|
||||
|
||||
```text
|
||||
if cap == 0:
|
||||
allocate()
|
||||
cap = 1
|
||||
else
|
||||
reallocate
|
||||
cap *= 2
|
||||
```
|
||||
|
||||
But Rust's only supported allocator API is so low level that we'll need to
|
||||
do a fair bit of extra work, though. We also need to guard against some special
|
||||
conditions that can occur with really large allocations. In particular, we index
|
||||
into arrays using unsigned integers, but `ptr::offset` takes signed integers. This
|
||||
means Bad Things will happen if we ever manage to grow to contain more than
|
||||
`isize::MAX` elements. Thankfully, this isn't something we need to worry about
|
||||
in most cases.
|
||||
|
||||
On 64-bit targets we're artifically limited to only 48-bits, so we'll run out
|
||||
of memory far before we reach that point. However on 32-bit targets, particularly
|
||||
those with extensions to use more of the address space, it's theoretically possible
|
||||
to successfully allocate more than `isize::MAX` bytes of memory. Still, we only
|
||||
really need to worry about that if we're allocating elements that are a byte large.
|
||||
Anything else will use up too much space.
|
||||
|
||||
However since this is a tutorial, we're not going to be particularly optimal here,
|
||||
and just unconditionally check, rather than use clever platform-specific `cfg`s.
|
||||
|
||||
```rust
|
||||
fn grow(&mut self) {
|
||||
// this is all pretty delicate, so let's say it's all unsafe
|
||||
unsafe {
|
||||
let align = mem::min_align_of::<T>();
|
||||
let elem_size = mem::size_of::<T>();
|
||||
|
||||
let (new_cap, ptr) = if self.cap == 0 {
|
||||
let ptr = heap::allocate(elem_size, align);
|
||||
(1, ptr)
|
||||
} else {
|
||||
// as an invariant, we can assume that `self.cap < isize::MAX`,
|
||||
// so this doesn't need to be checked.
|
||||
let new_cap = self.cap * 2;
|
||||
// Similarly this can't overflow due to previously allocating this
|
||||
let old_num_bytes = self.cap * elem_size;
|
||||
|
||||
// check that the new allocation doesn't exceed `isize::MAX` at all
|
||||
// regardless of the actual size of the capacity. This combines the
|
||||
// `new_cap <= isize::MAX` and `new_num_bytes <= usize::MAX` checks
|
||||
// we need to make. We lose the ability to allocate e.g. 2/3rds of
|
||||
// the address space with a single Vec of i16's on 32-bit though.
|
||||
// Alas, poor Yorick -- I knew him, Horatio.
|
||||
assert!(old_num_bytes <= (::std::isize::MAX as usize) / 2,
|
||||
"capacity overflow");
|
||||
|
||||
let new_num_bytes = old_num_bytes * 2;
|
||||
let ptr = heap::reallocate(*self.ptr as *mut _,
|
||||
old_num_bytes,
|
||||
new_num_bytes,
|
||||
align);
|
||||
(new_cap, ptr)
|
||||
};
|
||||
|
||||
// If allocate or reallocate fail, we'll get `null` back
|
||||
if ptr.is_null() { oom(); }
|
||||
|
||||
self.ptr = Unique::new(ptr as *mut _);
|
||||
self.cap = new_cap;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Nothing particularly tricky here. Just computing sizes and alignments and doing
|
||||
some careful multiplication checks.
|
||||
|
@ -0,0 +1,29 @@
|
||||
% Deallocating
|
||||
|
||||
Next we should implement Drop so that we don't massively leak tons of resources.
|
||||
The easiest way is to just call `pop` until it yields None, and then deallocate
|
||||
our buffer. Note that calling `pop` is uneeded if `T: !Drop`. In theory we can
|
||||
ask Rust if T needs_drop and omit the calls to `pop`. However in practice LLVM
|
||||
is *really* good at removing simple side-effect free code like this, so I wouldn't
|
||||
bother unless you notice it's not being stripped (in this case it is).
|
||||
|
||||
We must not call `heap::deallocate` when `self.cap == 0`, as in this case we haven't
|
||||
actually allocated any memory.
|
||||
|
||||
|
||||
```rust
|
||||
impl<T> Drop for Vec<T> {
|
||||
fn drop(&mut self) {
|
||||
if self.cap != 0 {
|
||||
while let Some(_) = self.pop() { }
|
||||
|
||||
let align = mem::min_align_of::<T>();
|
||||
let elem_size = mem::size_of::<T>();
|
||||
let num_bytes = elem_size * self.cap;
|
||||
unsafe {
|
||||
heap::deallocate(*self.ptr, num_bytes, align);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
@ -0,0 +1,40 @@
|
||||
% Deref
|
||||
|
||||
Alright! We've got a decent minimal ArrayStack implemented. We can push, we can
|
||||
pop, and we can clean up after ourselves. However there's a whole mess of functionality
|
||||
we'd reasonably want. In particular, we have a proper array, but none of the slice
|
||||
functionality. That's actually pretty easy to solve: we can implement `Deref<Target=[T]>`.
|
||||
This will magically make our Vec coerce to and behave like a slice in all sorts of
|
||||
conditions.
|
||||
|
||||
All we need is `slice::from_raw_parts`.
|
||||
|
||||
```rust
|
||||
use std::ops::Deref;
|
||||
|
||||
impl<T> Deref for Vec<T> {
|
||||
type Target = [T];
|
||||
fn deref(&self) -> &[T] {
|
||||
unsafe {
|
||||
::std::slice::from_raw_parts(*self.ptr, self.len)
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
And let's do DerefMut too:
|
||||
|
||||
```rust
|
||||
use std::ops::DerefMut;
|
||||
|
||||
impl<T> DerefMut for Vec<T> {
|
||||
fn deref_mut(&mut self) -> &mut [T] {
|
||||
unsafe {
|
||||
::std::slice::from_raw_parts_mut(*self.ptr, self.len)
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Now we have `len`, `first`, `last`, indexing, slicing, sorting, `iter`, `iter_mut`,
|
||||
and all other sorts of bells and whistles provided by slice. Sweet!
|
@ -0,0 +1,318 @@
|
||||
% Drain
|
||||
|
||||
Let's move on to Drain. Drain is largely the same as IntoIter, except that
|
||||
instead of consuming the Vec, it borrows the Vec and leaves its allocation
|
||||
free. For now we'll only implement the "basic" full-range version.
|
||||
|
||||
```rust,ignore
|
||||
use std::marker::PhantomData;
|
||||
|
||||
struct Drain<'a, T: 'a> {
|
||||
vec: PhantomData<&'a mut Vec<T>>
|
||||
start: *const T,
|
||||
end: *const T,
|
||||
}
|
||||
|
||||
impl<'a, T> Iterator for Drain<'a, T> {
|
||||
type Item = T;
|
||||
fn next(&mut self) -> Option<T> {
|
||||
if self.start == self.end {
|
||||
None
|
||||
```
|
||||
|
||||
-- wait, this is seeming familiar. Let's do some more compression. Both
|
||||
IntoIter and Drain have the exact same structure, let's just factor it out.
|
||||
|
||||
```rust
|
||||
struct RawValIter<T> {
|
||||
start: *const T,
|
||||
end: *const T,
|
||||
}
|
||||
|
||||
impl<T> RawValIter<T> {
|
||||
// unsafe to construct because it has no associated lifetimes.
|
||||
// This is necessary to store a RawValIter in the same struct as
|
||||
// its actual allocation. OK since it's a private implementation
|
||||
// detail.
|
||||
unsafe fn new(slice: &[T]) -> Self {
|
||||
RawValIter {
|
||||
start: slice.as_ptr(),
|
||||
end: if slice.len() == 0 {
|
||||
slice.as_ptr()
|
||||
} else {
|
||||
slice.as_ptr().offset(slice.len() as isize)
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Iterator and DoubleEndedIterator impls identical to IntoIter.
|
||||
```
|
||||
|
||||
And IntoIter becomes the following:
|
||||
|
||||
```
|
||||
pub struct IntoIter<T> {
|
||||
_buf: RawVec<T>, // we don't actually care about this. Just need it to live.
|
||||
iter: RawValIter<T>,
|
||||
}
|
||||
|
||||
impl<T> Iterator for IntoIter<T> {
|
||||
type Item = T;
|
||||
fn next(&mut self) -> Option<T> { self.iter.next() }
|
||||
fn size_hint(&self) -> (usize, Option<usize>) { self.iter.size_hint() }
|
||||
}
|
||||
|
||||
impl<T> DoubleEndedIterator for IntoIter<T> {
|
||||
fn next_back(&mut self) -> Option<T> { self.iter.next_back() }
|
||||
}
|
||||
|
||||
impl<T> Drop for IntoIter<T> {
|
||||
fn drop(&mut self) {
|
||||
for _ in &mut self.iter {}
|
||||
}
|
||||
}
|
||||
|
||||
impl<T> Vec<T> {
|
||||
pub fn into_iter(self) -> IntoIter<T> {
|
||||
unsafe {
|
||||
let iter = RawValIter::new(&self);
|
||||
let buf = ptr::read(&self.buf);
|
||||
mem::forget(self);
|
||||
|
||||
IntoIter {
|
||||
iter: iter,
|
||||
_buf: buf,
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Note that I've left a few quirks in this design to make upgrading Drain to work
|
||||
with arbitrary subranges a bit easier. In particular we *could* have RawValIter
|
||||
drain itself on drop, but that won't work right for a more complex Drain.
|
||||
We also take a slice to simplify Drain initialization.
|
||||
|
||||
Alright, now Drain is really easy:
|
||||
|
||||
```rust
|
||||
use std::marker::PhantomData;
|
||||
|
||||
pub struct Drain<'a, T: 'a> {
|
||||
vec: PhantomData<&'a mut Vec<T>>,
|
||||
iter: RawValIter<T>,
|
||||
}
|
||||
|
||||
impl<'a, T> Iterator for Drain<'a, T> {
|
||||
type Item = T;
|
||||
fn next(&mut self) -> Option<T> { self.iter.next_back() }
|
||||
fn size_hint(&self) -> (usize, Option<usize>) { self.iter.size_hint() }
|
||||
}
|
||||
|
||||
impl<'a, T> DoubleEndedIterator for Drain<'a, T> {
|
||||
fn next_back(&mut self) -> Option<T> { self.iter.next_back() }
|
||||
}
|
||||
|
||||
impl<'a, T> Drop for Drain<'a, T> {
|
||||
fn drop(&mut self) {
|
||||
for _ in &mut self.iter {}
|
||||
}
|
||||
}
|
||||
|
||||
impl<T> Vec<T> {
|
||||
pub fn drain(&mut self) -> Drain<T> {
|
||||
// this is a mem::forget safety thing. If Drain is forgotten, we just
|
||||
// leak the whole Vec's contents. Also we need to do this *eventually*
|
||||
// anyway, so why not do it now?
|
||||
self.len = 0;
|
||||
|
||||
unsafe {
|
||||
Drain {
|
||||
iter: RawValIter::new(&self),
|
||||
vec: PhantomData,
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
|
||||
|
||||
|
||||
# Handling Zero-Sized Types
|
||||
|
||||
It's time. We're going to fight the spectre that is zero-sized types. Safe Rust
|
||||
*never* needs to care about this, but Vec is very intensive on raw pointers and
|
||||
raw allocations, which are exactly the *only* two things that care about
|
||||
zero-sized types. We need to be careful of two things:
|
||||
|
||||
* The raw allocator API has undefined behaviour if you pass in 0 for an
|
||||
allocation size.
|
||||
* raw pointer offsets are no-ops for zero-sized types, which will break our
|
||||
C-style pointer iterator.
|
||||
|
||||
Thankfully we abstracted out pointer-iterators and allocating handling into
|
||||
RawValIter and RawVec respectively. How mysteriously convenient.
|
||||
|
||||
|
||||
|
||||
|
||||
## Allocating Zero-Sized Types
|
||||
|
||||
So if the allocator API doesn't support zero-sized allocations, what on earth
|
||||
do we store as our allocation? Why, `heap::EMPTY` of course! Almost every operation
|
||||
with a ZST is a no-op since ZSTs have exactly one value, and therefore no state needs
|
||||
to be considered to store or load them. This actually extends to `ptr::read` and
|
||||
`ptr::write`: they won't actually look at the pointer at all. As such we *never* need
|
||||
to change the pointer.
|
||||
|
||||
Note however that our previous reliance on running out of memory before overflow is
|
||||
no longer valid with zero-sized types. We must explicitly guard against capacity
|
||||
overflow for zero-sized types.
|
||||
|
||||
Due to our current architecture, all this means is writing 3 guards, one in each
|
||||
method of RawVec.
|
||||
|
||||
```rust
|
||||
impl<T> RawVec<T> {
|
||||
fn new() -> Self {
|
||||
unsafe {
|
||||
// !0 is usize::MAX. This branch should be stripped at compile time.
|
||||
let cap = if mem::size_of::<T>() == 0 { !0 } else { 0 };
|
||||
|
||||
// heap::EMPTY doubles as "unallocated" and "zero-sized allocation"
|
||||
RawVec { ptr: Unique::new(heap::EMPTY as *mut T), cap: cap }
|
||||
}
|
||||
}
|
||||
|
||||
fn grow(&mut self) {
|
||||
unsafe {
|
||||
let elem_size = mem::size_of::<T>();
|
||||
|
||||
// since we set the capacity to usize::MAX when elem_size is
|
||||
// 0, getting to here necessarily means the Vec is overfull.
|
||||
assert!(elem_size != 0, "capacity overflow");
|
||||
|
||||
let align = mem::min_align_of::<T>();
|
||||
|
||||
let (new_cap, ptr) = if self.cap == 0 {
|
||||
let ptr = heap::allocate(elem_size, align);
|
||||
(1, ptr)
|
||||
} else {
|
||||
let new_cap = 2 * self.cap;
|
||||
let ptr = heap::reallocate(*self.ptr as *mut _,
|
||||
self.cap * elem_size,
|
||||
new_cap * elem_size,
|
||||
align);
|
||||
(new_cap, ptr)
|
||||
};
|
||||
|
||||
// If allocate or reallocate fail, we'll get `null` back
|
||||
if ptr.is_null() { oom() }
|
||||
|
||||
self.ptr = Unique::new(ptr as *mut _);
|
||||
self.cap = new_cap;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl<T> Drop for RawVec<T> {
|
||||
fn drop(&mut self) {
|
||||
let elem_size = mem::size_of::<T>();
|
||||
|
||||
// don't free zero-sized allocations, as they were never allocated.
|
||||
if self.cap != 0 && elem_size != 0 {
|
||||
let align = mem::min_align_of::<T>();
|
||||
|
||||
let num_bytes = elem_size * self.cap;
|
||||
unsafe {
|
||||
heap::deallocate(*self.ptr as *mut _, num_bytes, align);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
That's it. We support pushing and popping zero-sized types now. Our iterators
|
||||
(that aren't provided by slice Deref) are still busted, though.
|
||||
|
||||
|
||||
|
||||
|
||||
## Iterating Zero-Sized Types
|
||||
|
||||
Zero-sized offsets are no-ops. This means that our current design will always
|
||||
initialize `start` and `end` as the same value, and our iterators will yield
|
||||
nothing. The current solution to this is to cast the pointers to integers,
|
||||
increment, and then cast them back:
|
||||
|
||||
```
|
||||
impl<T> RawValIter<T> {
|
||||
unsafe fn new(slice: &[T]) -> Self {
|
||||
RawValIter {
|
||||
start: slice.as_ptr(),
|
||||
end: if mem::size_of::<T>() == 0 {
|
||||
((slice.as_ptr() as usize) + slice.len()) as *const _
|
||||
} else if slice.len() == 0 {
|
||||
slice.as_ptr()
|
||||
} else {
|
||||
slice.as_ptr().offset(slice.len() as isize)
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Now we have a different bug. Instead of our iterators not running at all, our
|
||||
iterators now run *forever*. We need to do the same trick in our iterator impls.
|
||||
Also, our size_hint computation code will divide by 0 for ZSTs. Since we'll
|
||||
basically be treating the two pointers as if they point to bytes, we'll just
|
||||
map size 0 to divide by 1.
|
||||
|
||||
```
|
||||
impl<T> Iterator for RawValIter<T> {
|
||||
type Item = T;
|
||||
fn next(&mut self) -> Option<T> {
|
||||
if self.start == self.end {
|
||||
None
|
||||
} else {
|
||||
unsafe {
|
||||
let result = ptr::read(self.start);
|
||||
self.start = if mem::size_of::<T>() == 0 {
|
||||
(self.start as usize + 1) as *const _
|
||||
} else {
|
||||
self.start.offset(1);
|
||||
}
|
||||
Some(result)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
fn size_hint(&self) -> (usize, Option<usize>) {
|
||||
let elem_size = mem::size_of::<T>();
|
||||
let len = (self.end as usize - self.start as usize)
|
||||
/ if elem_size == 0 { 1 } else { elem_size };
|
||||
(len, Some(len))
|
||||
}
|
||||
}
|
||||
|
||||
impl<T> DoubleEndedIterator for RawValIter<T> {
|
||||
fn next_back(&mut self) -> Option<T> {
|
||||
if self.start == self.end {
|
||||
None
|
||||
} else {
|
||||
unsafe {
|
||||
self.end = if mem::size_of::<T>() == 0 {
|
||||
(self.end as usize - 1) as *const _
|
||||
} else {
|
||||
self.end.offset(-1);
|
||||
}
|
||||
Some(ptr::read(self.end))
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
And that's it. Iteration works!
|
@ -0,0 +1,309 @@
|
||||
% The Final Code
|
||||
|
||||
```rust
|
||||
#![feature(unique)]
|
||||
#![feature(heap_api)]
|
||||
|
||||
use std::ptr::{Unique, self};
|
||||
use std::rt::heap;
|
||||
use std::mem;
|
||||
use std::ops::{Deref, DerefMut};
|
||||
use std::marker::PhantomData;
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
struct RawVec<T> {
|
||||
ptr: Unique<T>,
|
||||
cap: usize,
|
||||
}
|
||||
|
||||
impl<T> RawVec<T> {
|
||||
fn new() -> Self {
|
||||
unsafe {
|
||||
// !0 is usize::MAX. This branch should be stripped at compile time.
|
||||
let cap = if mem::size_of::<T>() == 0 { !0 } else { 0 };
|
||||
|
||||
// heap::EMPTY doubles as "unallocated" and "zero-sized allocation"
|
||||
RawVec { ptr: Unique::new(heap::EMPTY as *mut T), cap: cap }
|
||||
}
|
||||
}
|
||||
|
||||
fn grow(&mut self) {
|
||||
unsafe {
|
||||
let elem_size = mem::size_of::<T>();
|
||||
|
||||
// since we set the capacity to usize::MAX when elem_size is
|
||||
// 0, getting to here necessarily means the Vec is overfull.
|
||||
assert!(elem_size != 0, "capacity overflow");
|
||||
|
||||
let align = mem::min_align_of::<T>();
|
||||
|
||||
let (new_cap, ptr) = if self.cap == 0 {
|
||||
let ptr = heap::allocate(elem_size, align);
|
||||
(1, ptr)
|
||||
} else {
|
||||
let new_cap = 2 * self.cap;
|
||||
let ptr = heap::reallocate(*self.ptr as *mut _,
|
||||
self.cap * elem_size,
|
||||
new_cap * elem_size,
|
||||
align);
|
||||
(new_cap, ptr)
|
||||
};
|
||||
|
||||
// If allocate or reallocate fail, we'll get `null` back
|
||||
if ptr.is_null() { oom() }
|
||||
|
||||
self.ptr = Unique::new(ptr as *mut _);
|
||||
self.cap = new_cap;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl<T> Drop for RawVec<T> {
|
||||
fn drop(&mut self) {
|
||||
let elem_size = mem::size_of::<T>();
|
||||
if self.cap != 0 && elem_size != 0 {
|
||||
let align = mem::min_align_of::<T>();
|
||||
|
||||
let num_bytes = elem_size * self.cap;
|
||||
unsafe {
|
||||
heap::deallocate(*self.ptr as *mut _, num_bytes, align);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
pub struct Vec<T> {
|
||||
buf: RawVec<T>,
|
||||
len: usize,
|
||||
}
|
||||
|
||||
impl<T> Vec<T> {
|
||||
fn ptr(&self) -> *mut T { *self.buf.ptr }
|
||||
|
||||
fn cap(&self) -> usize { self.buf.cap }
|
||||
|
||||
pub fn new() -> Self {
|
||||
Vec { buf: RawVec::new(), len: 0 }
|
||||
}
|
||||
pub fn push(&mut self, elem: T) {
|
||||
if self.len == self.cap() { self.buf.grow(); }
|
||||
|
||||
unsafe {
|
||||
ptr::write(self.ptr().offset(self.len as isize), elem);
|
||||
}
|
||||
|
||||
// Can't fail, we'll OOM first.
|
||||
self.len += 1;
|
||||
}
|
||||
|
||||
pub fn pop(&mut self) -> Option<T> {
|
||||
if self.len == 0 {
|
||||
None
|
||||
} else {
|
||||
self.len -= 1;
|
||||
unsafe {
|
||||
Some(ptr::read(self.ptr().offset(self.len as isize)))
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
pub fn insert(&mut self, index: usize, elem: T) {
|
||||
assert!(index <= self.len, "index out of bounds");
|
||||
if self.cap() == self.len { self.buf.grow(); }
|
||||
|
||||
unsafe {
|
||||
if index < self.len {
|
||||
ptr::copy(self.ptr().offset(index as isize),
|
||||
self.ptr().offset(index as isize + 1),
|
||||
self.len - index);
|
||||
}
|
||||
ptr::write(self.ptr().offset(index as isize), elem);
|
||||
self.len += 1;
|
||||
}
|
||||
}
|
||||
|
||||
pub fn remove(&mut self, index: usize) -> T {
|
||||
assert!(index < self.len, "index out of bounds");
|
||||
unsafe {
|
||||
self.len -= 1;
|
||||
let result = ptr::read(self.ptr().offset(index as isize));
|
||||
ptr::copy(self.ptr().offset(index as isize + 1),
|
||||
self.ptr().offset(index as isize),
|
||||
self.len - index);
|
||||
result
|
||||
}
|
||||
}
|
||||
|
||||
pub fn into_iter(self) -> IntoIter<T> {
|
||||
unsafe {
|
||||
let iter = RawValIter::new(&self);
|
||||
let buf = ptr::read(&self.buf);
|
||||
mem::forget(self);
|
||||
|
||||
IntoIter {
|
||||
iter: iter,
|
||||
_buf: buf,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
pub fn drain(&mut self) -> Drain<T> {
|
||||
// this is a mem::forget safety thing. If this is forgotten, we just
|
||||
// leak the whole Vec's contents. Also we need to do this *eventually*
|
||||
// anyway, so why not do it now?
|
||||
self.len = 0;
|
||||
unsafe {
|
||||
Drain {
|
||||
iter: RawValIter::new(&self),
|
||||
vec: PhantomData,
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl<T> Drop for Vec<T> {
|
||||
fn drop(&mut self) {
|
||||
while let Some(_) = self.pop() {}
|
||||
// allocation is handled by RawVec
|
||||
}
|
||||
}
|
||||
|
||||
impl<T> Deref for Vec<T> {
|
||||
type Target = [T];
|
||||
fn deref(&self) -> &[T] {
|
||||
unsafe {
|
||||
::std::slice::from_raw_parts(self.ptr(), self.len)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl<T> DerefMut for Vec<T> {
|
||||
fn deref_mut(&mut self) -> &mut [T] {
|
||||
unsafe {
|
||||
::std::slice::from_raw_parts_mut(self.ptr(), self.len)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
struct RawValIter<T> {
|
||||
start: *const T,
|
||||
end: *const T,
|
||||
}
|
||||
|
||||
impl<T> RawValIter<T> {
|
||||
unsafe fn new(slice: &[T]) -> Self {
|
||||
RawValIter {
|
||||
start: slice.as_ptr(),
|
||||
end: if mem::size_of::<T>() == 0 {
|
||||
((slice.as_ptr() as usize) + slice.len()) as *const _
|
||||
} else if slice.len() == 0 {
|
||||
slice.as_ptr()
|
||||
} else {
|
||||
slice.as_ptr().offset(slice.len() as isize)
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl<T> Iterator for RawValIter<T> {
|
||||
type Item = T;
|
||||
fn next(&mut self) -> Option<T> {
|
||||
if self.start == self.end {
|
||||
None
|
||||
} else {
|
||||
unsafe {
|
||||
let result = ptr::read(self.start);
|
||||
self.start = self.start.offset(1);
|
||||
Some(result)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
fn size_hint(&self) -> (usize, Option<usize>) {
|
||||
let elem_size = mem::size_of::<T>();
|
||||
let len = (self.end as usize - self.start as usize)
|
||||
/ if elem_size == 0 { 1 } else { elem_size };
|
||||
(len, Some(len))
|
||||
}
|
||||
}
|
||||
|
||||
impl<T> DoubleEndedIterator for RawValIter<T> {
|
||||
fn next_back(&mut self) -> Option<T> {
|
||||
if self.start == self.end {
|
||||
None
|
||||
} else {
|
||||
unsafe {
|
||||
self.end = self.end.offset(-1);
|
||||
Some(ptr::read(self.end))
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
|
||||
|
||||
pub struct IntoIter<T> {
|
||||
_buf: RawVec<T>, // we don't actually care about this. Just need it to live.
|
||||
iter: RawValIter<T>,
|
||||
}
|
||||
|
||||
impl<T> Iterator for IntoIter<T> {
|
||||
type Item = T;
|
||||
fn next(&mut self) -> Option<T> { self.iter.next() }
|
||||
fn size_hint(&self) -> (usize, Option<usize>) { self.iter.size_hint() }
|
||||
}
|
||||
|
||||
impl<T> DoubleEndedIterator for IntoIter<T> {
|
||||
fn next_back(&mut self) -> Option<T> { self.iter.next_back() }
|
||||
}
|
||||
|
||||
impl<T> Drop for IntoIter<T> {
|
||||
fn drop(&mut self) {
|
||||
for _ in &mut *self {}
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
|
||||
|
||||
pub struct Drain<'a, T: 'a> {
|
||||
vec: PhantomData<&'a mut Vec<T>>,
|
||||
iter: RawValIter<T>,
|
||||
}
|
||||
|
||||
impl<'a, T> Iterator for Drain<'a, T> {
|
||||
type Item = T;
|
||||
fn next(&mut self) -> Option<T> { self.iter.next_back() }
|
||||
fn size_hint(&self) -> (usize, Option<usize>) { self.iter.size_hint() }
|
||||
}
|
||||
|
||||
impl<'a, T> DoubleEndedIterator for Drain<'a, T> {
|
||||
fn next_back(&mut self) -> Option<T> { self.iter.next_back() }
|
||||
}
|
||||
|
||||
impl<'a, T> Drop for Drain<'a, T> {
|
||||
fn drop(&mut self) {
|
||||
// pre-drain the iter
|
||||
for _ in &mut self.iter {}
|
||||
}
|
||||
}
|
||||
|
||||
/// Abort the process, we're out of memory!
|
||||
///
|
||||
/// In practice this is probably dead code on most OSes
|
||||
fn oom() {
|
||||
::std::process::exit(-9999);
|
||||
}
|
||||
```
|
@ -0,0 +1,50 @@
|
||||
% Insert and Remove
|
||||
|
||||
Something *not* provided but slice is `insert` and `remove`, so let's do those next.
|
||||
|
||||
Insert needs to shift all the elements at the target index to the right by one.
|
||||
To do this we need to use `ptr::copy`, which is our version of C's `memmove`.
|
||||
This copies some chunk of memory from one location to another, correctly handling
|
||||
the case where the source and destination overlap (which will definitely happen
|
||||
here).
|
||||
|
||||
If we insert at index `i`, we want to shift the `[i .. len]` to `[i+1 .. len+1]`
|
||||
using the *old* len.
|
||||
|
||||
```rust
|
||||
pub fn insert(&mut self, index: usize, elem: T) {
|
||||
// Note: `<=` because it's valid to insert after everything
|
||||
// which would be equivalent to push.
|
||||
assert!(index <= self.len, "index out of bounds");
|
||||
if self.cap == self.len { self.grow(); }
|
||||
|
||||
unsafe {
|
||||
if index < self.len {
|
||||
// ptr::copy(src, dest, len): "copy from source to dest len elems"
|
||||
ptr::copy(self.ptr.offset(index as isize),
|
||||
self.ptr.offset(index as isize + 1),
|
||||
len - index);
|
||||
}
|
||||
ptr::write(self.ptr.offset(index as isize), elem);
|
||||
self.len += 1;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Remove behaves in the opposite manner. We need to shift all the elements from
|
||||
`[i+1 .. len + 1]` to `[i .. len]` using the *new* len.
|
||||
|
||||
```rust
|
||||
pub fn remove(&mut self, index: usize) -> T {
|
||||
// Note: `<` because it's *not* valid to remove after everything
|
||||
assert!(index < self.len, "index out of bounds");
|
||||
unsafe {
|
||||
self.len -= 1;
|
||||
let result = ptr::read(self.ptr.offset(index as isize));
|
||||
ptr::copy(self.ptr.offset(index as isize + 1),
|
||||
self.ptr.offset(index as isize),
|
||||
len - index);
|
||||
result
|
||||
}
|
||||
}
|
||||
```
|
@ -0,0 +1,293 @@
|
||||
% IntoIter
|
||||
|
||||
Let's move on to writing iterators. `iter` and `iter_mut` have already been
|
||||
written for us thanks to The Magic of Deref. However there's two interesting
|
||||
iterators that Vec provides that slices can't: `into_iter` and `drain`.
|
||||
|
||||
IntoIter consumes the Vec by-value, and can consequently yield its elements
|
||||
by-value. In order to enable this, IntoIter needs to take control of Vec's
|
||||
allocation.
|
||||
|
||||
IntoIter needs to be DoubleEnded as well, to enable reading from both ends.
|
||||
Reading from the back could just be implemented as calling `pop`, but reading
|
||||
from the front is harder. We could call `remove(0)` but that would be insanely
|
||||
expensive. Instead we're going to just use ptr::read to copy values out of either
|
||||
end of the Vec without mutating the buffer at all.
|
||||
|
||||
To do this we're going to use a very common C idiom for array iteration. We'll
|
||||
make two pointers; one that points to the start of the array, and one that points
|
||||
to one-element past the end. When we want an element from one end, we'll read out
|
||||
the value pointed to at that end and move the pointer over by one. When the two
|
||||
pointers are equal, we know we're done.
|
||||
|
||||
Note that the order of read and offset are reversed for `next` and `next_back`
|
||||
For `next_back` the pointer is always *after* the element it wants to read next,
|
||||
while for `next` the pointer is always *at* the element it wants to read next.
|
||||
To see why this is, consider the case where every element but one has been yielded.
|
||||
|
||||
The array looks like this:
|
||||
|
||||
```text
|
||||
S E
|
||||
[X, X, X, O, X, X, X]
|
||||
```
|
||||
|
||||
If E pointed directly at the element it wanted to yield next, it would be
|
||||
indistinguishable from the case where there are no more elements to yield.
|
||||
|
||||
So we're going to use the following struct:
|
||||
|
||||
```rust
|
||||
struct IntoIter<T> {
|
||||
buf: Unique<T>,
|
||||
cap: usize,
|
||||
start: *const T,
|
||||
end: *const T,
|
||||
}
|
||||
```
|
||||
|
||||
One last subtle detail: if our Vec is empty, we want to produce an empty iterator.
|
||||
This will actually technically fall out doing the naive thing of:
|
||||
|
||||
```text
|
||||
start = ptr
|
||||
end = ptr.offset(len)
|
||||
```
|
||||
|
||||
However because `offset` is marked as a GEP inbounds instruction, this will tell
|
||||
LLVM that ptr is allocated and won't alias other allocated memory. This is fine
|
||||
for zero-sized types, as they can't alias anything. However if we're using
|
||||
`heap::EMPTY` as a sentinel for a non-allocation for a *non-zero-sized* type,
|
||||
this can cause undefined behaviour. Alas, we must therefore special case either
|
||||
cap or len being 0 to not do the offset.
|
||||
|
||||
So this is what we end up with for initialization:
|
||||
|
||||
```rust
|
||||
impl<T> Vec<T> {
|
||||
fn into_iter(self) -> IntoIter<T> {
|
||||
// Can't destructure Vec since it's Drop
|
||||
let ptr = self.ptr;
|
||||
let cap = self.cap;
|
||||
let len = self.len;
|
||||
|
||||
// Make sure not to drop Vec since that will free the buffer
|
||||
mem::forget(self);
|
||||
|
||||
unsafe {
|
||||
IntoIter {
|
||||
buf: ptr,
|
||||
cap: cap,
|
||||
start: *ptr,
|
||||
end: if cap == 0 {
|
||||
// can't offset off this pointer, it's not allocated!
|
||||
*ptr
|
||||
} else {
|
||||
ptr.offset(len as isize)
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Here's iterating forward:
|
||||
|
||||
```rust
|
||||
impl<T> Iterator for IntoIter<T> {
|
||||
type Item = T;
|
||||
fn next(&mut self) -> Option<T> {
|
||||
if self.start == self.end {
|
||||
None
|
||||
} else {
|
||||
unsafe {
|
||||
let result = ptr::read(self.start);
|
||||
self.start = self.start.offset(1);
|
||||
Some(result)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
fn size_hint(&self) -> (usize, Option<usize>) {
|
||||
let len = (self.end as usize - self.start as usize)
|
||||
/ mem::size_of::<T>();
|
||||
(len, Some(len))
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
And here's iterating backwards.
|
||||
|
||||
```rust
|
||||
impl<T> DoubleEndedIterator for IntoIter<T> {
|
||||
fn next_back(&mut self) -> Option<T> {
|
||||
if self.start == self.end {
|
||||
None
|
||||
} else {
|
||||
unsafe {
|
||||
self.end = self.end.offset(-1);
|
||||
Some(ptr::read(self.end))
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Because IntoIter takes ownership of its allocation, it needs to implement Drop
|
||||
to free it. However it *also* wants to implement Drop to drop any elements it
|
||||
contains that weren't yielded.
|
||||
|
||||
|
||||
```rust
|
||||
impl<T> Drop for IntoIter<T> {
|
||||
fn drop(&mut self) {
|
||||
if self.cap != 0 {
|
||||
// drop any remaining elements
|
||||
for _ in &mut *self {}
|
||||
|
||||
let align = mem::min_align_of::<T>();
|
||||
let elem_size = mem::size_of::<T>();
|
||||
let num_bytes = elem_size * self.cap;
|
||||
unsafe {
|
||||
heap::deallocate(*self.buf as *mut _, num_bytes, align);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
We've actually reached an interesting situation here: we've duplicated the logic
|
||||
for specifying a buffer and freeing its memory. Now that we've implemented it and
|
||||
identified *actual* logic duplication, this is a good time to perform some logic
|
||||
compression.
|
||||
|
||||
We're going to abstract out the `(ptr, cap)` pair and give them the logic for
|
||||
allocating, growing, and freeing:
|
||||
|
||||
```rust
|
||||
|
||||
struct RawVec<T> {
|
||||
ptr: Unique<T>,
|
||||
cap: usize,
|
||||
}
|
||||
|
||||
impl<T> RawVec<T> {
|
||||
fn new() -> Self {
|
||||
assert!(mem::size_of::<T>() != 0, "TODO: implement ZST support");
|
||||
unsafe {
|
||||
RawVec { ptr: Unique::new(heap::EMPTY as *mut T), cap: 0 }
|
||||
}
|
||||
}
|
||||
|
||||
// unchanged from Vec
|
||||
fn grow(&mut self) {
|
||||
unsafe {
|
||||
let align = mem::min_align_of::<T>();
|
||||
let elem_size = mem::size_of::<T>();
|
||||
|
||||
let (new_cap, ptr) = if self.cap == 0 {
|
||||
let ptr = heap::allocate(elem_size, align);
|
||||
(1, ptr)
|
||||
} else {
|
||||
let new_cap = 2 * self.cap;
|
||||
let ptr = heap::reallocate(*self.ptr as *mut _,
|
||||
self.cap * elem_size,
|
||||
new_cap * elem_size,
|
||||
align);
|
||||
(new_cap, ptr)
|
||||
};
|
||||
|
||||
// If allocate or reallocate fail, we'll get `null` back
|
||||
if ptr.is_null() { oom() }
|
||||
|
||||
self.ptr = Unique::new(ptr as *mut _);
|
||||
self.cap = new_cap;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
impl<T> Drop for RawVec<T> {
|
||||
fn drop(&mut self) {
|
||||
if self.cap != 0 {
|
||||
let align = mem::min_align_of::<T>();
|
||||
let elem_size = mem::size_of::<T>();
|
||||
let num_bytes = elem_size * self.cap;
|
||||
unsafe {
|
||||
heap::deallocate(*self.ptr as *mut _, num_bytes, align);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
And change vec as follows:
|
||||
|
||||
```rust
|
||||
pub struct Vec<T> {
|
||||
buf: RawVec<T>,
|
||||
len: usize,
|
||||
}
|
||||
|
||||
impl<T> Vec<T> {
|
||||
fn ptr(&self) -> *mut T { *self.buf.ptr }
|
||||
|
||||
fn cap(&self) -> usize { self.buf.cap }
|
||||
|
||||
pub fn new() -> Self {
|
||||
Vec { buf: RawVec::new(), len: 0 }
|
||||
}
|
||||
|
||||
// push/pop/insert/remove largely unchanged:
|
||||
// * `self.ptr -> self.ptr()`
|
||||
// * `self.cap -> self.cap()`
|
||||
// * `self.grow -> self.buf.grow()`
|
||||
}
|
||||
|
||||
impl<T> Drop for Vec<T> {
|
||||
fn drop(&mut self) {
|
||||
while let Some(_) = self.pop() {}
|
||||
// deallocation is handled by RawVec
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
And finally we can really simplify IntoIter:
|
||||
|
||||
```rust
|
||||
struct IntoIter<T> {
|
||||
_buf: RawVec<T>, // we don't actually care about this. Just need it to live.
|
||||
start: *const T,
|
||||
end: *const T,
|
||||
}
|
||||
|
||||
// next and next_back litterally unchanged since they never referred to the buf
|
||||
|
||||
impl<T> Drop for IntoIter<T> {
|
||||
fn drop(&mut self) {
|
||||
// only need to ensure all our elements are read;
|
||||
// buffer will clean itself up afterwards.
|
||||
for _ in &mut *self {}
|
||||
}
|
||||
}
|
||||
|
||||
impl<T> Vec<T> {
|
||||
pub fn into_iter(self) -> IntoIter<T> {
|
||||
unsafe {
|
||||
// need to use ptr::read to unsafely move the buf out since it's
|
||||
// not Copy.
|
||||
let buf = ptr::read(&self.buf);
|
||||
let len = self.len;
|
||||
mem::forget(self);
|
||||
|
||||
IntoIter {
|
||||
start: *buf.ptr,
|
||||
end: buf.ptr.offset(len as isize),
|
||||
_buf: buf,
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Much better.
|
@ -0,0 +1,61 @@
|
||||
% Layout
|
||||
|
||||
First off, we need to come up with the struct layout. Naively we want this
|
||||
design:
|
||||
|
||||
```rust
|
||||
struct Vec<T> {
|
||||
ptr: *mut T,
|
||||
cap: usize,
|
||||
len: usize,
|
||||
}
|
||||
```
|
||||
|
||||
And indeed this would compile. Unfortunately, it would be incorrect. The compiler
|
||||
will give us too strict variance, so e.g. an `&Vec<&'static str>` couldn't be used
|
||||
where an `&Vec<&'a str>` was expected. More importantly, it will give incorrect
|
||||
ownership information to dropck, as it will conservatively assume we don't own
|
||||
any values of type `T`. See [the chapter on ownership and lifetimes]
|
||||
(lifetimes.html) for details.
|
||||
|
||||
As we saw in the lifetimes chapter, we should use `Unique<T>` in place of `*mut T`
|
||||
when we have a raw pointer to an allocation we own:
|
||||
|
||||
|
||||
```rust
|
||||
#![feature(unique)]
|
||||
|
||||
use std::ptr::{Unique, self};
|
||||
|
||||
pub struct Vec<T> {
|
||||
ptr: Unique<T>,
|
||||
cap: usize,
|
||||
len: usize,
|
||||
}
|
||||
```
|
||||
|
||||
As a recap, Unique is a wrapper around a raw pointer that declares that:
|
||||
|
||||
* We own at least one value of type `T`
|
||||
* We are Send/Sync iff `T` is Send/Sync
|
||||
* Our pointer is never null (and therefore `Option<Vec>` is null-pointer-optimized)
|
||||
|
||||
That last point is subtle. First, it makes `Unique::new` unsafe to call, because
|
||||
putting `null` inside of it is Undefined Behaviour. It also throws a
|
||||
wrench in an important feature of Vec (and indeed all of the std collections):
|
||||
an empty Vec doesn't actually allocate at all. So if we can't allocate,
|
||||
but also can't put a null pointer in `ptr`, what do we do in
|
||||
`Vec::new`? Well, we just put some other garbage in there!
|
||||
|
||||
This is perfectly fine because we already have `cap == 0` as our sentinel for no
|
||||
allocation. We don't even need to handle it specially in almost any code because
|
||||
we usually need to check if `cap > len` or `len > 0` anyway. The traditional
|
||||
Rust value to put here is `0x01`. The standard library actually exposes this
|
||||
as `std::rt::heap::EMPTY`. There are quite a few places where we'll want to use
|
||||
`heap::EMPTY` because there's no real allocation to talk about but `null` would
|
||||
make the compiler angry.
|
||||
|
||||
All of the `heap` API is totally unstable under the `heap_api` feature, though.
|
||||
We could trivially define `heap::EMPTY` ourselves, but we'll want the rest of
|
||||
the `heap` API anyway, so let's just get that dependency over with.
|
||||
|
@ -0,0 +1,55 @@
|
||||
% Push and Pop
|
||||
|
||||
Alright. We can initialize. We can allocate. Let's actually implement some
|
||||
functionality! Let's start with `push`. All it needs to do is check if we're
|
||||
full to grow, unconditionally write to the next index, and then increment our
|
||||
length.
|
||||
|
||||
To do the write we have to be careful not to evaluate the memory we want to write
|
||||
to. At worst, it's truly uninitialized memory from the allocator. At best it's the
|
||||
bits of some old value we popped off. Either way, we can't just index to the memory
|
||||
and dereference it, because that will evaluate the memory as a valid instance of
|
||||
T. Worse, `foo[idx] = x` will try to call `drop` on the old value of `foo[idx]`!
|
||||
|
||||
The correct way to do this is with `ptr::write`, which just blindly overwrites the
|
||||
target address with the bits of the value we provide. No evaluation involved.
|
||||
|
||||
For `push`, if the old len (before push was called) is 0, then we want to write
|
||||
to the 0th index. So we should offset by the old len.
|
||||
|
||||
```rust
|
||||
pub fn push(&mut self, elem: T) {
|
||||
if self.len == self.cap { self.grow(); }
|
||||
|
||||
unsafe {
|
||||
ptr::write(self.ptr.offset(self.len as isize), elem);
|
||||
}
|
||||
|
||||
// Can't fail, we'll OOM first.
|
||||
self.len += 1;
|
||||
}
|
||||
```
|
||||
|
||||
Easy! How about `pop`? Although this time the index we want to access is
|
||||
initialized, Rust won't just let us dereference the location of memory to move
|
||||
the value out, because that *would* leave the memory uninitialized! For this we
|
||||
need `ptr::read`, which just copies out the bits from the target address and
|
||||
intrprets it as a value of type T. This will leave the memory at this address
|
||||
*logically* uninitialized, even though there is in fact a perfectly good instance
|
||||
of T there.
|
||||
|
||||
For `pop`, if the old len is 1, we want to read out of the 0th index. So we
|
||||
should offset by the *new* len.
|
||||
|
||||
```rust
|
||||
pub fn pop(&mut self) -> Option<T> {
|
||||
if self.len == 0 {
|
||||
None
|
||||
} else {
|
||||
self.len -= 1;
|
||||
unsafe {
|
||||
Some(ptr::read(self.ptr.offset(self.len as isize)))
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
@ -0,0 +1,88 @@
|
||||
% Working with Unsafe
|
||||
|
||||
Rust generally only gives us the tools to talk about safety in a scoped and
|
||||
binary manner. Unfortunately reality is significantly more complicated than that.
|
||||
For instance, consider the following toy function:
|
||||
|
||||
```rust
|
||||
fn do_idx(idx: usize, arr: &[u8]) -> Option<u8> {
|
||||
if idx < arr.len() {
|
||||
unsafe {
|
||||
Some(*arr.get_unchecked(idx))
|
||||
}
|
||||
} else {
|
||||
None
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Clearly, this function is safe. We check that the index is in bounds, and if it
|
||||
is, index into the array in an unchecked manner. But even in such a trivial
|
||||
function, the scope of the unsafe block is questionable. Consider changing the
|
||||
`<` to a `<=`:
|
||||
|
||||
```rust
|
||||
fn do_idx(idx: usize, arr: &[u8]) -> Option<u8> {
|
||||
if idx <= arr.len() {
|
||||
unsafe {
|
||||
Some(*arr.get_unchecked(idx))
|
||||
}
|
||||
} else {
|
||||
None
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
This program is now unsound, an yet *we only modified safe code*. This is the
|
||||
fundamental problem of safety: it's non-local. The soundness of our unsafe
|
||||
operations necessarily depends on the state established by "safe" operations.
|
||||
Although safety *is* modular (we *still* don't need to worry about about
|
||||
unrelated safety issues like uninitialized memory), it quickly contaminates the
|
||||
surrounding code.
|
||||
|
||||
Trickier than that is when we get into actual statefulness. Consider a simple
|
||||
implementation of `Vec`:
|
||||
|
||||
```rust
|
||||
// Note this defintion is insufficient. See the section on lifetimes.
|
||||
struct Vec<T> {
|
||||
ptr: *mut T,
|
||||
len: usize,
|
||||
cap: usize,
|
||||
}
|
||||
|
||||
// Note this implementation does not correctly handle zero-sized types.
|
||||
// We currently live in a nice imaginary world of only positive fixed-size
|
||||
// types.
|
||||
impl<T> Vec<T> {
|
||||
fn push(&mut self, elem: T) {
|
||||
if self.len == self.cap {
|
||||
// not important for this example
|
||||
self.reallocate();
|
||||
}
|
||||
unsafe {
|
||||
ptr::write(self.ptr.offset(len as isize), elem);
|
||||
self.len += 1;
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
This code is simple enough to reasonably audit and verify. Now consider
|
||||
adding the following method:
|
||||
|
||||
```rust
|
||||
fn make_room(&mut self) {
|
||||
// grow the capacity
|
||||
self.cap += 1;
|
||||
}
|
||||
```
|
||||
|
||||
This code is safe, but it is also completely unsound. Changing the capacity
|
||||
violates the invariants of Vec (that `cap` reflects the allocated space in the
|
||||
Vec). This is not something the rest of `Vec` can guard against. It *has* to
|
||||
trust the capacity field because there's no way to verify it.
|
||||
|
||||
`unsafe` does more than pollute a whole function: it pollutes a whole *module*.
|
||||
Generally, the only bullet-proof way to limit the scope of unsafe code is at the
|
||||
module boundary with privacy.
|
Loading…
Reference in new issue