24 KiB

Raw Blame History

% Example: Implementing Vec

TODO: audit for non-ZST offsets from heap::empty

To bring everything together, we're going to write std::Vec from scratch. Because the all the best tools for writing unsafe code are unstable, this project will only work on nightly (as of Rust 1.2.0).

Layout

First off, we need to come up with the struct layout. Naively we want this design:

struct Vec<T> {
    ptr: *mut T,
    cap: usize,
    len: usize,
}

And indeed this would compile. Unfortunately, it would be incorrect. The compiler will give us too strict variance, so e.g. an &Vec<&'static str> couldn't be used where an &Vec<&'a str> was expected. More importantly, it will give incorrect ownership information to dropck, as it will conservatively assume we don't own any values of type T. See [the chapter on ownership and lifetimes] (lifetimes.html) for details.

As we saw in the lifetimes chapter, we should use Unique<T> in place of *mut T when we have a raw pointer to an allocation we own:

#![feature(unique)]

use std::ptr::{Unique, self};

pub struct Vec<T> {
    ptr: Unique<T>,
    cap: usize,
    len: usize,
}

As a recap, Unique is a wrapper around a raw pointer that declares that:

We own at least one value of type T
We are Send/Sync iff T is Send/Sync
Our pointer is never null (and therefore Option<Vec> is null-pointer-optimized)

That last point is subtle. First, it makes Unique::new unsafe to call, because putting null inside of it is Undefined Behaviour. It also throws a wrench in an important feature of Vec (and indeed all of the std collections): an empty Vec doesn't actually allocate at all. So if we can't allocate, but also can't put a null pointer in ptr, what do we do in Vec::new? Well, we just put some other garbage in there!

This is perfectly fine because we already have cap == 0 as our sentinel for no allocation. We don't even need to handle it specially in almost any code because we usually need to check if cap > len or len > 0 anyway. The traditional Rust value to put here is 0x01. The standard library actually exposes this as std::rt::heap::EMPTY. There are quite a few places where we'll want to use heap::EMPTY because there's no real allocation to talk about but null would make the compiler angry.

All of the heap API is totally unstable under the alloc feature, though. We could trivially define heap::EMPTY ourselves, but we'll want the rest of the heap API anyway, so let's just get that dependency over with.

Allocating Memory

So:

#![feature(alloc)]

use std::rt::heap::EMPTY;
use std::mem;

impl<T> Vec<T> {
    fn new() -> Self {
        assert!(mem::size_of::<T>() != 0, "We're not ready to handle ZSTs");
        unsafe {
            // need to cast EMPTY to the actual ptr type we want, let
            // inference handle it.
            Vec { ptr: Unique::new(heap::EMPTY as *mut _), len: 0, cap: 0 }
        }
    }
}

I slipped in that assert there because zero-sized types will require some special handling throughout our code, and I want to defer the issue for now. Without this assert, some of our early drafts will do some Very Bad Things.

Next we need to figure out what to actually do when we do want space. For that, we'll need to use the rest of the heap APIs. These basically allow us to talk directly to Rust's instance of jemalloc.

We'll also need a way to handle out-of-memory conditions. The standard library calls the abort intrinsic, but calling intrinsics from normal Rust code is a pretty bad idea. Unfortunately, the abort exposed by the standard library allocates. Not something we want to do during oom! Instead, we'll call std::process::exit.

fn oom() {
    ::std::process::exit(-9999);
}

Okay, now we can write growing. Roughly, we want to have this logic:

if cap == 0:
    allocate()
    cap = 1
else
    reallocate
    cap *= 2

But Rust's only supported allocator API is so low level that we'll need to do a fair bit of extra work, though. We also need to guard against some special conditions that can occur with really large allocations. In particular, we index into arrays using unsigned integers, but ptr::offset takes signed integers. This means Bad Things will happen if we ever manage to grow to contain more than isize::MAX elements. Thankfully, this isn't something we need to worry about in most cases.

On 64-bit targets we're artifically limited to only 48-bits, so we'll run out of memory far before we reach that point. However on 32-bit targets, particularly those with extensions to use more of the address space, it's theoretically possible to successfully allocate more than isize::MAX bytes of memory. Still, we only really need to worry about that if we're allocating elements that are a byte large. Anything else will use up too much space.

However since this is a tutorial, we're not going to be particularly optimal here, and just unconditionally check, rather than use clever platform-specific cfgs.

fn grow(&mut self) {
    // this is all pretty delicate, so let's say it's all unsafe
    unsafe {
        let align = mem::min_align_of::<T>();
        let elem_size = mem::size_of::<T>();

        let (new_cap, ptr) = if self.cap == 0 {
            let ptr = heap::allocate(elem_size, align);
            (1, ptr)
        } else {
            // as an invariant, we can assume that `self.cap < isize::MAX`,
            // so this doesn't need to be checked.
            let new_cap = self.cap * 2;
            // Similarly this can't overflow due to previously allocating this
            let old_num_bytes = self.cap * elem_size;

            // check that the new allocation doesn't exceed `isize::MAX` at all
            // regardless of the actual size of the capacity. This combines the
            // `new_cap <= isize::MAX` and `new_num_bytes <= usize::MAX` checks
            // we need to make. We lose the ability to allocate e.g. 2/3rds of
            // the address space with a single Vec of i16's on 32-bit though.
            // Alas, poor Yorick -- I knew him, Horatio.
            assert!(old_num_bytes <= (::std::isize::MAX as usize) / 2,
                    "capacity overflow");

            let new_num_bytes = old_num_bytes * 2;
            let ptr = heap::reallocate(*self.ptr as *mut _,
                                        old_num_bytes,
                                        new_num_bytes,
                                        align);
            (new_cap, ptr)
        };

        // If allocate or reallocate fail, we'll get `null` back
        if ptr.is_null() { oom(); }

        self.ptr = Unique::new(ptr as *mut _);
        self.cap = new_cap;
    }
}

Nothing particularly tricky here. Just computing sizes and alignments and doing some careful multiplication checks.

Push and Pop

Alright. We can initialize. We can allocate. Let's actually implement some functionality! Let's start with push. All it needs to do is check if we're full to grow, unconditionally write to the next index, and then increment our length.

To do the write we have to be careful not to evaluate the memory we want to write to. At worst, it's truly uninitialized memory from the allocator. At best it's the bits of some old value we popped off. Either way, we can't just index to the memory and dereference it, because that will evaluate the memory as a valid instance of T. Worse, foo[idx] = x will try to call drop on the old value of foo[idx]!

The correct way to do this is with ptr::write, which just blindly overwrites the target address with the bits of the value we provide. No evaluation involved.

For push, if the old len (before push was called) is 0, then we want to write to the 0th index. So we should offset by the old len.

pub fn push(&mut self, elem: T) {
    if self.len == self.cap { self.grow(); }

    unsafe {
        ptr::write(self.ptr.offset(self.len as isize), elem);
    }

    // Can't fail, we'll OOM first.
    self.len += 1;
}

Easy! How about pop? Although this time the index we want to access is initialized, Rust won't just let us dereference the location of memory to move the value out, because that would leave the memory uninitialized! For this we need ptr::read, which just copies out the bits from the target address and intrprets it as a value of type T. This will leave the memory at this address logically uninitialized, even though there is in fact a perfectly good instance of T there.

For pop, if the old len is 1, we want to read out of the 0th index. So we should offset by the new len.

pub fn pop(&mut self) -> Option<T> {
    if self.len == 0 {
        None
    } else {
        self.len -= 1;
        unsafe {
            Some(ptr::read(self.ptr.offset(self.len as isize)))
        }
    }
}

Deallocating

Next we should implement Drop so that we don't massively leaks tons of resources. The easiest way is to just call pop until it yields None, and then deallocate our buffer. Note that calling pop is uneeded if T: !Drop. In theory we can ask Rust if T needs_drop and omit the calls to pop. However in practice LLVM is really good at removing simple side-effect free code like this, so I wouldn't bother unless you notice it's not being stripped (in this case it is).

We must not call heap::deallocate when self.cap == 0, as in this case we haven't actually allocated any memory.

impl<T> Drop for Vec<T> {
    fn drop(&mut self) {
        if self.cap != 0 {
            while let Some(_) = self.pop() { }

            let align = mem::min_align_of::<T>();
            let elem_size = mem::size_of::<T>();
            let num_bytes = elem_size * self.cap;
            unsafe {
                heap::deallocate(*self.ptr, num_bytes, align);
            }
        }
    }
}

Deref

Alright! We've got a decent minimal ArrayStack implemented. We can push, we can pop, and we can clean up after ourselves. However there's a whole mess of functionality we'd reasonably want. In particular, we have a proper array, but none of the slice functionality. That's actually pretty easy to solve: we can implement Deref<Target=[T]>. This will magically make our Vec coerce to and behave like a slice in all sorts of conditions.

All we need is slice::from_raw_parts.

use std::ops::Deref;

impl<T> Deref for Vec<T> {
    type Target = [T];
    fn deref(&self) -> &[T] {
        unsafe {
            ::std::slice::from_raw_parts(*self.ptr, self.len)
        }
    }
}

And let's do DerefMut too:

use std::ops::DerefMut;

impl<T> DerefMut for Vec<T> {
    fn deref_mut(&mut self) -> &mut [T] {
        unsafe {
            ::std::slice::from_raw_parts_mut(*self.ptr, self.len)
        }
    }
}

Now we have len, first, last, indexing, slicing, sorting, iter, iter_mut, and all other sorts of bells and whistles provided by slice. Sweet!

Insert and Remove

Something not provided but slice is insert and remove, so let's do those next.

Insert needs to shift all the elements at the target index to the right by one. To do this we need to use ptr::copy, which is our version of C's memmove. This copies some chunk of memory from one location to another, correctly handling the case where the source and destination overlap (which will definitely happen here).

If we insert at index i, we want to shift the [i .. len] to [i+1 .. len+1] using the old len.

pub fn insert(&mut self, index: usize, elem: T) {
    // Note: `<=` because it's valid to insert after everything
    // which would be equivalent to push.
    assert!(index <= self.len, "index out of bounds");
    if self.cap == self.len { self.grow(); }

    unsafe {
        if index < self.len {
            // ptr::copy(src, dest, len): "copy from source to dest len elems"
            ptr::copy(self.ptr.offset(index as isize),
                      self.ptr.offset(index as isize + 1),
                      len - index);
        }
        ptr::write(self.ptr.offset(index as isize), elem);
        self.len += 1;
    }
}

Remove behaves in the opposite manner. We need to shift all the elements from [i+1 .. len + 1] to [i .. len] using the new len.

pub fn remove(&mut self, index: usize) -> T {
    // Note: `<` because it's *not* valid to remove after everything
    assert!(index < self.len, "index out of bounds");
    unsafe {
        self.len -= 1;
        let result = ptr::read(self.ptr.offset(index as isize));
        ptr::copy(self.ptr.offset(index as isize + 1),
                  self.ptr.offset(index as isize),
                  len - index);
        result
    }
}

IntoIter

Let's move on to writing iterators. iter and iter_mut have already been written for us thanks to The Magic of Deref. However there's two interesting iterators that Vec provides that slices can't: into_iter and drain.

IntoIter consumes the Vec by-value, and can consequently yield its elements by-value. In order to enable this, IntoIter needs to take control of Vec's allocation.

IntoIter needs to be DoubleEnded as well, to enable reading from both ends. Reading from the back could just be implemented as calling pop, but reading from the front is harder. We could call remove(0) but that would be insanely expensive. Instead we're going to just use ptr::read to copy values out of either end of the Vec without mutating the buffer at all.

To do this we're going to use a very common C idiom for array iteration. We'll make two pointers; one that points to the start of the array, and one that points to one-element past the end. When we want an element from one end, we'll read out the value pointed to at that end and move the pointer over by one. When the two pointers are equal, we know we're done.

Note that the order of read and offset are reversed for next and next_back For next_back the pointer is always after the element it wants to read next, while for next the pointer is always at the element it wants to read next. To see why this is, consider the case where every element but one has been yielded.

The array looks like this:

          S  E
[X, X, X, O, X, X, X]

If E pointed directly at the element it wanted to yield next, it would be indistinguishable from the case where there are no more elements to yield.

So we're going to use the following struct:

struct IntoIter<T> {
    buf: Unique<T>,
    cap: usize,
    start: *const T,
    end: *const T,
}

And initialize it like this:

impl<T> Vec<T> {
    fn into_iter(self) -> IntoIter<T> {
        // Can't destructure Vec since it's Drop
        let ptr = self.ptr;
        let cap = self.cap;
        let len = self.len;

        // Make sure not to drop Vec since that will free the buffer
        mem::forget(self);

        unsafe {
            IntoIter {
                buf: ptr,
                cap: cap,
                start: *ptr,
                end: ptr.offset(len as isize),
            }
        }
    }
}

Here's iterating forward:

impl<T> Iterator for IntoIter<T> {
    type Item = T;
    fn next(&mut self) -> Option<T> {
        if self.start == self.end {
            None
        } else {
            unsafe {
                let result = ptr::read(self.start);
                self.start = self.start.offset(1);
                Some(result)
            }
        }
    }

    fn size_hint(&self) -> (usize, Option<usize>) {
        let len = self.end as usize - self.start as usize;
        (len, Some(len))
    }
}

And here's iterating backwards.

impl<T> DoubleEndedIterator for IntoIter<T> {
    fn next_back(&mut self) -> Option<T> {
        if self.start == self.end {
            None
        } else {
            unsafe {
                self.end = self.end.offset(-1);
                Some(ptr::read(self.end))
            }
        }
    }
}

Because IntoIter takes ownership of its allocation, it needs to implement Drop
to free it. However it *also* wants to implement Drop to drop any elements it
contains that weren't yielded.


```rust
impl<T> Drop for IntoIter<T> {
    fn drop(&mut self) {
        if self.cap != 0 {
            // drop any remaining elements
            for _ in &mut *self {}

            let align = mem::min_align_of::<T>();
            let elem_size = mem::size_of::<T>();
            let num_bytes = elem_size * self.cap;
            unsafe {
                heap::deallocate(*self.buf as *mut _, num_bytes, align);
            }
        }
    }
}

We've actually reached an interesting situation here: we've duplicated the logic for specifying a buffer and freeing its memory. Now that we've implemented it and identified actual logic duplication, this is a good time to perform some logic compression.

We're going to abstract out the (ptr, cap) pair and give them the logic for allocating, growing, and freeing:


struct RawVec<T> {
    ptr: Unique<T>,
    cap: usize,
}

impl<T> RawVec<T> {
    fn new() -> Self {
        assert!(mem::size_of::<T>() != 0, "TODO: implement ZST support");
        unsafe {
            RawVec { ptr: Unique::new(heap::EMPTY as *mut T), cap: 0 }
        }
    }

    // unchanged from Vec
    fn grow(&mut self) {
        unsafe {
            let align = mem::min_align_of::<T>();
            let elem_size = mem::size_of::<T>();

            let (new_cap, ptr) = if self.cap == 0 {
                let ptr = heap::allocate(elem_size, align);
                (1, ptr)
            } else {
                let new_cap = 2 * self.cap;
                let ptr = heap::reallocate(*self.ptr as *mut _,
                                            self.cap * elem_size,
                                            new_cap * elem_size,
                                            align);
                (new_cap, ptr)
            };

            // If allocate or reallocate fail, we'll get `null` back
            if ptr.is_null() { oom() }

            self.ptr = Unique::new(ptr as *mut _);
            self.cap = new_cap;
        }
    }
}


impl<T> Drop for RawVec<T> {
    fn drop(&mut self) {
        if self.cap != 0 {
            let align = mem::min_align_of::<T>();
            let elem_size = mem::size_of::<T>();
            let num_bytes = elem_size * self.cap;
            unsafe {
                heap::deallocate(*self.ptr as *mut _, num_bytes, align);
            }
        }
    }
}

And change vec as follows:

pub struct Vec<T> {
    buf: RawVec<T>,
    len: usize,
}

impl<T> Vec<T> {
    fn ptr(&self) -> *mut T { *self.buf.ptr }

    fn cap(&self) -> usize { self.buf.cap }

    pub fn new() -> Self {
        Vec { buf: RawVec::new(), len: 0 }
    }

    // push/pop/insert/remove largely unchanged:
    // * `self.ptr -> self.ptr()`
    // * `self.cap -> self.cap()`
    // * `self.grow -> self.buf.grow()`
}

impl<T> Drop for Vec<T> {
    fn drop(&mut self) {
        while let Some(_) = self.pop() {}
        // deallocation is handled by RawVec
    }
}

And finally we can really simplify IntoIter:

struct IntoIter<T> {
    _buf: RawVec<T>, // we don't actually care about this. Just need it to live.
    start: *const T,
    end: *const T,
}

// next and next_back litterally unchanged since they never referred to the buf

impl<T> Drop for IntoIter<T> {
    fn drop(&mut self) {
        // only need to ensure all our elements are read;
        // buffer will clean itself up afterwards.
        for _ in &mut *self {}
    }
}

impl<T> Vec<T> {
    pub fn into_iter(self) -> IntoIter<T> {
        unsafe {
            // need to use ptr::read to unsafely move the buf out since it's
            // not Copy.
            let buf = ptr::read(&self.buf);
            let len = self.len;
            mem::forget(self);

            IntoIter {
                start: *buf.ptr,
                end: buf.ptr.offset(len as isize),
                _buf: buf,
            }
        }
    }
}

Much better.

Drain

Let's move on to Drain. Drain is largely the same as IntoIter, except that instead of consuming the Vec, it borrows the Vec and leaves its allocation free. For now we'll only implement the "basic" full-range version.

use std::marker::PhantomData;

struct Drain<'a, T: 'a> {
    vec: PhantomData<&'a mut Vec<T>>
    start: *const T,
    end: *const T,
}

impl<'a, T> Iterator for Drain<'a, T> {
    type Item = T;
    fn next(&mut self) -> Option<T> {
        if self.start == self.end {
            None

-- wait, this is seeming familiar. Let's do some more compression. Both IntoIter and Drain have the exact same structure, let's just factor it out.

struct RawValIter<T> {
    start: *const T,
    end: *const T,
}

impl<T> RawValIter<T> {
    // unsafe to construct because it has no associated lifetimes.
    // This is necessary to store a RawValIter in the same struct as
    // its actual allocation. OK since it's a private implementation
    // detail.
    unsafe fn new(slice: &[T]) -> Self {
        RawValIter {
            start: slice.as_ptr(),
            end: slice.as_ptr().offset(slice.len() as isize),
        }
    }
}

// Iterator and DoubleEndedIterator impls identical to IntoIter.

And IntoIter becomes the following:

pub struct IntoIter<T> {
    _buf: RawVec<T>, // we don't actually care about this. Just need it to live.
    iter: RawValIter<T>,
}

impl<T> Iterator for IntoIter<T> {
    type Item = T;
    fn next(&mut self) -> Option<T> { self.iter.next() }
    fn size_hint(&self) -> (usize, Option<usize>) { self.iter.size_hint() }
}

impl<T> DoubleEndedIterator for IntoIter<T> {
    fn next_back(&mut self) -> Option<T> { self.iter.next_back() }
}

impl<T> Drop for IntoIter<T> {
    fn drop(&mut self) {
        for _ in &mut self.iter {}
    }
}

impl<T> Vec<T> {
    pub fn into_iter(self) -> IntoIter<T> {
        unsafe {
            let iter = RawValIter::new(&self);
            let buf = ptr::read(&self.buf);
            mem::forget(self);

            IntoIter {
                iter: iter,
                _buf: buf,
            }
        }
    }
}

Note that I've left a few quirks in this design to make upgrading Drain to work with arbitrary subranges a bit easier. In particular we could have RawValIter drain itself on drop, but that won't work right for a more complex Drain. We also take a slice to simplify Drain initialization.

Alright, now Drain is really easy:

use std::marker::PhantomData;

pub struct Drain<'a, T: 'a> {
    vec: PhantomData<&'a mut Vec<T>>,
    iter: RawValIter<T>,
}

impl<'a, T> Iterator for Drain<'a, T> {
    type Item = T;
    fn next(&mut self) -> Option<T> { self.iter.next_back() }
    fn size_hint(&self) -> (usize, Option<usize>) { self.iter.size_hint() }
}

impl<'a, T> DoubleEndedIterator for Drain<'a, T> {
    fn next_back(&mut self) -> Option<T> { self.iter.next_back() }
}

impl<'a, T> Drop for Drain<'a, T> {
    fn drop(&mut self) {
        for _ in &mut self.iter {}
    }
}

impl<T> Vec<T> {
    pub fn drain(&mut self) -> Drain<T> {
        // this is a mem::forget safety thing. If Drain is forgotten, we just
        // leak the whole Vec's contents. Also we need to do this *eventually*
        // anyway, so why not do it now?
        self.len = 0;

        unsafe {
            Drain {
                iter: RawValIter::new(&self),
                vec: PhantomData,
            }
        }
    }
}

Handling Zero-Sized Types

It's time. We're going to fight the spectre that is zero-sized types. Safe Rust never needs to care about this, but Vec is very intensive on raw pointers and raw allocations, which are exactly the only two things that care about zero-sized types. We need to be careful of two things:

The raw allocator API has undefined behaviour if you pass in 0 for an allocation size.
raw pointer offsets are no-ops for zero-sized types, which will break our C-style pointer iterator

Thankfully we abstracted out pointer-iterators and allocating handling into RawValIter and RawVec respectively. How mysteriously convenient.

Allocating Zero-Sized Types

So if the allocator API doesn't support zero-sized allocations, what on earth do we store as our allocation? Why, heap::EMPTY of course! Almost every operation with a ZST is a no-op since ZSTs have exactly one value, and therefore no state needs to be considered to store or load them. This actually extends to ptr::read and ptr::write: they won't actually look at the pointer at all. As such we never need to change the pointer.

TODO

Iterating Zero-Sized Types