vec 1.0

10 years ago · 0498ad1926
parent fad7507d4e
commit 0498ad1926
1 changed files with 509 additions and 11 deletions
--- a/vec.md
+++ b/vec.md
@ -1,11 +1,11 @@
 % Example: Implementing Vec
 TODO: audit for non-ZST offsets from heap::empty
 To bring everything together, we're going to write `std::Vec` from scratch.
 Because the all the best tools for writing unsafe code are unstable, this
 project will only work on nightly (as of Rust 1.2.0).
 # Layout
 First off, we need to come up with the struct layout. Naively we want this
@ -63,16 +63,19 @@ as `std::rt::heap::EMPTY`. There are quite a few places where we'll want to use
 `heap::EMPTY` because there's no real allocation to talk about but `null` would
 make the compiler angry.
-All of the `heap` API is totally unstable under the `alloc` feature, though.
+All of the `heap` API is totally unstable under the `heap_api` feature, though.
 We could trivially define `heap::EMPTY` ourselves, but we'll want the rest of
 the `heap` API anyway, so let's just get that dependency over with.
 # Allocating Memory
 So:
 ```rust
-#![feature(alloc)]
+#![feature(heap_api)]
 use std::rt::heap::EMPTY;
 use std::mem;
@ -184,6 +187,10 @@ fn grow(&mut self) {
 Nothing particularly tricky here. Just computing sizes and alignments and doing
 some careful multiplication checks.
 # Push and Pop
 Alright. We can initialize. We can allocate. Let's actually implement some
@ -240,6 +247,10 @@ pub fn pop(&mut self) -> Option<T> {
 }
 ```
 # Deallocating
 Next we should implement Drop so that we don't massively leaks tons of resources.
@ -270,6 +281,10 @@ impl<T> Drop for Vec<T> {
 }
 ```
 # Deref
 Alright! We've got a decent minimal ArrayStack implemented. We can push, we can
@ -311,6 +326,10 @@ impl<T> DerefMut for Vec<T> {
 Now we have `len`, `first`, `last`, indexing, slicing, sorting, `iter`, `iter_mut`,
 and all other sorts of bells and whistles provided by slice. Sweet!
 # Insert and Remove
 Something *not* provided but slice is `insert` and `remove`, so let's do those next.
@ -362,6 +381,10 @@ pub fn remove(&mut self, index: usize) -> T {
 }
 ```
 # IntoIter
 Let's move on to writing iterators. `iter` and `iter_mut` have already been
@ -410,7 +433,22 @@ struct IntoIter<T> {
 }
 ```
-And initialize it like this:
+One last subtle detail: if our Vec is empty, we want to produce an empty iterator.
 This will actually technically fall out doing the naive thing of:
 ```text
 start = ptr
 end = ptr.offset(len)
 ```
 However because `offset` is marked as a GEP inbounds instruction, this will tell
 llVM that ptr is allocated and won't alias other allocated memory. This is fine
 for zero-sized types, as they can't alias anything. However if we're using
 heap::EMPTY as a sentinel for a non-allocation for a *non-zero-sized* type,
 this can cause undefined behaviour. Alas, we must therefore special case either
 cap or len being 0 to not do the offset.
 So this is what we end up with for initialization:
 ```rust
 impl<T> Vec<T> {
@ -428,7 +466,12 @@ impl<T> Vec<T> {
                buf: ptr,
                cap: cap,
                start: *ptr,
-                end: ptr.offset(len as isize),
+                end: if cap == 0 {
                    // can't offset off this pointer, it's not allocated!
                    *ptr
                } else {
                    ptr.offset(len as isize)
                }
            }
        }
    }
@ -635,6 +678,10 @@ impl<T> Vec<T> {
 Much better.
 # Drain
 Let's move on to Drain. Drain is largely the same as IntoIter, except that
@ -674,7 +721,11 @@ impl<T> RawValIter<T> {
    unsafe fn new(slice: &[T]) -> Self {
        RawValIter {
            start: slice.as_ptr(),
-            end: slice.as_ptr().offset(slice.len() as isize),
+            end: if slice.len() == 0 {
                slice.as_ptr()
            } else {
                slice.as_ptr().offset(slice.len() as isize)
            }
        }
    }
 }
@ -771,6 +822,8 @@ impl<T> Vec<T> {
 ```
 # Handling Zero-Sized Types
 It's time. We're going to fight the spectre that is zero-sized types. Safe Rust
@ -781,13 +834,14 @@ zero-sized types. We need to be careful of two things:
 * The raw allocator API has undefined behaviour if you pass in 0 for an
  allocation size.
 * raw pointer offsets are no-ops for zero-sized types, which will break our
-  C-style pointer iterator
+  C-style pointer iterator.
 Thankfully we abstracted out pointer-iterators and allocating handling into
 RawValIter and RawVec respectively. How mysteriously convenient.
 ## Allocating Zero-Sized Types
 So if the allocator API doesn't support zero-sized allocations, what on earth
@ -797,13 +851,457 @@ to be considered to store or load them. This actually extends to `ptr::read` and
 `ptr::write`: they won't actually look at the pointer at all. As such we *never* need
 to change the pointer.
-TODO
+Note however that our previous reliance on running out of memory before overflow is
 no longer valid with zero-sized types. We must explicitly guard against capacity
 overflow for zero-sized types.
 Due to our current architecture, all this means is writing 3 guards, one in each
 method of RawVec.
 ```rust
 impl<T> RawVec<T> {
    fn new() -> Self {
        unsafe {
            // -1 is usize::MAX. This branch should be stripped at compile time.
            let cap = if mem::size_of::<T>() == 0 { -1 } else { 0 };
            // heap::EMPTY doubles as "unallocated" and "zero-sized allocation"
            RawVec { ptr: Unique::new(heap::EMPTY as *mut T), cap: cap }
        }
    }
    fn grow(&mut self) {
        unsafe {
            let elem_size = mem::size_of::<T>();
            // since we set the capacity to usize::MAX when elem_size is
            // 0, getting to here necessarily means the Vec is overfull.
            assert!(elem_size != 0, "capacity overflow");
            let align = mem::min_align_of::<T>();
            let (new_cap, ptr) = if self.cap == 0 {
                let ptr = heap::allocate(elem_size, align);
                (1, ptr)
            } else {
                let new_cap = 2 * self.cap;
                let ptr = heap::reallocate(*self.ptr as *mut _,
                                            self.cap * elem_size,
                                            new_cap * elem_size,
                                            align);
                (new_cap, ptr)
            };
            // If allocate or reallocate fail, we'll get `null` back
            if ptr.is_null() { oom() }
            self.ptr = Unique::new(ptr as *mut _);
            self.cap = new_cap;
        }
    }
 }
 impl<T> Drop for RawVec<T> {
    fn drop(&mut self) {
        let elem_size = mem::size_of::<T>();
        // don't free zero-sized allocations, as they were never allocated.
        if self.cap != 0 && elem_size != 0 {
            let align = mem::min_align_of::<T>();
            let num_bytes = elem_size * self.cap;
            unsafe {
                heap::deallocate(*self.ptr as *mut _, num_bytes, align);
            }
        }
    }
 }
 ```
 That's it. We support pushing and popping zero-sized types now. Our iterators
 (that aren't provided by slice Deref) are still busted, though.
 ## Iterating Zero-Sized Types
-TODO
+Zero-sized offsets are no-ops. This means that our current design will always
 initialize `start` and `end` as the same value, and our iterators will yield
 nothing. The current solution to this is to cast the pointers to integers,
 increment, and then cast them back:
 ```
 impl<T> RawValIter<T> {
    unsafe fn new(slice: &[T]) -> Self {
        RawValIter {
            start: slice.as_ptr(),
            end: if mem::size_of::<T>() == 0 {
                ((slice.as_ptr() as usize) + slice.len()) as *const _
            } else if slice.len() == 0 {
                slice.as_ptr()
            } else {
                slice.as_ptr().offset(slice.len() as isize)
            }
        }
    }
 }
 ```
 Now we have a different bug. Instead of our iterators not running at all, our
 iterators now run *forever*. We need to do the same trick in our iterator impls:
-## Advanced Drain
+```
 impl<T> Iterator for RawValIter<T> {
    type Item = T;
    fn next(&mut self) -> Option<T> {
        if self.start == self.end {
            None
        } else {
            unsafe {
                let result = ptr::read(self.start);
                self.start = if mem::size_of::<T>() == 0 {
                    (self.start as usize + 1) as *const _
                } else {
                    self.start.offset(1);
                }
                Some(result)
            }
        }
    }
    fn size_hint(&self) -> (usize, Option<usize>) {
        let len = self.end as usize - self.start as usize;
        (len, Some(len))
    }
 }
 impl<T> DoubleEndedIterator for RawValIter<T> {
    fn next_back(&mut self) -> Option<T> {
        if self.start == self.end {
            None
        } else {
            unsafe {
                self.end = if mem::size_of::<T>() == 0 {
                    (self.end as usize - 1) as *const _
                } else {
                    self.end.offset(-1);
                }
                Some(ptr::read(self.end))
            }
        }
    }
 }
 ```
 And that's it. Iteration works!
 # Advanced Drain
 TODO? Not clear if informative
 # The Final Code
 ```rust
 #![feature(unique)]
 #![feature(heap_api)]
 use std::ptr::{Unique, self};
 use std::rt::heap;
 use std::mem;
 use std::ops::{Deref, DerefMut};
 use std::marker::PhantomData;
 struct RawVec<T> {
    ptr: Unique<T>,
    cap: usize,
 }
 impl<T> RawVec<T> {
    fn new() -> Self {
        unsafe {
            // -1 is usize::MAX. This branch should be stripped at compile time.
            let cap = if mem::size_of::<T>() == 0 { -1 } else { 0 };
            // heap::EMPTY doubles as "unallocated" and "zero-sized allocation"
            RawVec { ptr: Unique::new(heap::EMPTY as *mut T), cap: cap }
        }
    }
    fn grow(&mut self) {
        unsafe {
            let elem_size = mem::size_of::<T>();
            // since we set the capacity to usize::MAX when elem_size is
            // 0, getting to here necessarily means the Vec is overfull.
            assert!(elem_size != 0, "capacity overflow");
            let align = mem::min_align_of::<T>();
            let (new_cap, ptr) = if self.cap == 0 {
                let ptr = heap::allocate(elem_size, align);
                (1, ptr)
            } else {
                let new_cap = 2 * self.cap;
                let ptr = heap::reallocate(*self.ptr as *mut _,
                                            self.cap * elem_size,
                                            new_cap * elem_size,
                                            align);
                (new_cap, ptr)
            };
            // If allocate or reallocate fail, we'll get `null` back
            if ptr.is_null() { oom() }
            self.ptr = Unique::new(ptr as *mut _);
            self.cap = new_cap;
        }
    }
 }
 impl<T> Drop for RawVec<T> {
    fn drop(&mut self) {
        let elem_size = mem::size_of::<T>();
        if self.cap != 0 && elem_size != 0 {
            let align = mem::min_align_of::<T>();
            let num_bytes = elem_size * self.cap;
            unsafe {
                heap::deallocate(*self.ptr as *mut _, num_bytes, align);
            }
        }
    }
 }
 pub struct Vec<T> {
    buf: RawVec<T>,
    len: usize,
 }
 impl<T> Vec<T> {
    fn ptr(&self) -> *mut T { *self.buf.ptr }
    fn cap(&self) -> usize { self.buf.cap }
    pub fn new() -> Self {
        Vec { buf: RawVec::new(), len: 0 }
    }
    pub fn push(&mut self, elem: T) {
        if self.len == self.cap() { self.buf.grow(); }
        unsafe {
            ptr::write(self.ptr().offset(self.len as isize), elem);
        }
        // Can't fail, we'll OOM first.
        self.len += 1;
    }
    pub fn pop(&mut self) -> Option<T> {
        if self.len == 0 {
            None
        } else {
            self.len -= 1;
            unsafe {
                Some(ptr::read(self.ptr().offset(self.len as isize)))
            }
        }
    }
    pub fn insert(&mut self, index: usize, elem: T) {
        assert!(index <= self.len, "index out of bounds");
        if self.cap() == self.len { self.buf.grow(); }
        unsafe {
            if index < self.len {
                ptr::copy(self.ptr().offset(index as isize),
                          self.ptr().offset(index as isize + 1),
                          self.len - index);
            }
            ptr::write(self.ptr().offset(index as isize), elem);
            self.len += 1;
        }
    }
    pub fn remove(&mut self, index: usize) -> T {
        assert!(index < self.len, "index out of bounds");
        unsafe {
            self.len -= 1;
            let result = ptr::read(self.ptr().offset(index as isize));
            ptr::copy(self.ptr().offset(index as isize + 1),
                      self.ptr().offset(index as isize),
                      self.len - index);
            result
        }
    }
    pub fn into_iter(self) -> IntoIter<T> {
        unsafe {
            let iter = RawValIter::new(&self);
            let buf = ptr::read(&self.buf);
            mem::forget(self);
            IntoIter {
                iter: iter,
                _buf: buf,
            }
        }
    }
    pub fn drain(&mut self) -> Drain<T> {
        // this is a mem::forget safety thing. If this is forgotten, we just
        // leak the whole Vec's contents. Also we need to do this *eventually*
        // anyway, so why not do it now?
        self.len = 0;
        unsafe {
            Drain {
                iter: RawValIter::new(&self),
                vec: PhantomData,
            }
        }
    }
 }
 impl<T> Drop for Vec<T> {
    fn drop(&mut self) {
        while let Some(_) = self.pop() {}
        // allocation is handled by RawVec
    }
 }
 impl<T> Deref for Vec<T> {
    type Target = [T];
    fn deref(&self) -> &[T] {
        unsafe {
            ::std::slice::from_raw_parts(self.ptr(), self.len)
        }
    }
 }
 impl<T> DerefMut for Vec<T> {
    fn deref_mut(&mut self) -> &mut [T] {
        unsafe {
            ::std::slice::from_raw_parts_mut(self.ptr(), self.len)
        }
    }
 }
 struct RawValIter<T> {
    start: *const T,
    end: *const T,
 }
 impl<T> RawValIter<T> {
    unsafe fn new(slice: &[T]) -> Self {
        RawValIter {
            start: slice.as_ptr(),
            end: if mem::size_of::<T>() == 0 {
                ((slice.as_ptr() as usize) + slice.len()) as *const _
            } else if slice.len() == 0 {
                slice.as_ptr()
            } else {
                slice.as_ptr().offset(slice.len() as isize)
            }
        }
    }
 }
 impl<T> Iterator for RawValIter<T> {
    type Item = T;
    fn next(&mut self) -> Option<T> {
        if self.start == self.end {
            None
        } else {
            unsafe {
                let result = ptr::read(self.start);
                self.start = self.start.offset(1);
                Some(result)
            }
        }
    }
    fn size_hint(&self) -> (usize, Option<usize>) {
        let len = self.end as usize - self.start as usize;
        (len, Some(len))
    }
 }
 impl<T> DoubleEndedIterator for RawValIter<T> {
    fn next_back(&mut self) -> Option<T> {
        if self.start == self.end {
            None
        } else {
            unsafe {
                self.end = self.end.offset(-1);
                Some(ptr::read(self.end))
            }
        }
    }
 }
 pub struct IntoIter<T> {
    _buf: RawVec<T>, // we don't actually care about this. Just need it to live.
    iter: RawValIter<T>,
 }
 impl<T> Iterator for IntoIter<T> {
    type Item = T;
    fn next(&mut self) -> Option<T> { self.iter.next() }
    fn size_hint(&self) -> (usize, Option<usize>) { self.iter.size_hint() }
 }
 impl<T> DoubleEndedIterator for IntoIter<T> {
    fn next_back(&mut self) -> Option<T> { self.iter.next_back() }
 }
 impl<T> Drop for IntoIter<T> {
    fn drop(&mut self) {
        for _ in &mut *self {}
    }
 }
 pub struct Drain<'a, T: 'a> {
    vec: PhantomData<&'a mut Vec<T>>,
    iter: RawValIter<T>,
 }
 impl<'a, T> Iterator for Drain<'a, T> {
    type Item = T;
    fn next(&mut self) -> Option<T> { self.iter.next_back() }
    fn size_hint(&self) -> (usize, Option<usize>) { self.iter.size_hint() }
 }
 impl<'a, T> DoubleEndedIterator for Drain<'a, T> {
    fn next_back(&mut self) -> Option<T> { self.iter.next_back() }
 }
 impl<'a, T> Drop for Drain<'a, T> {
    fn drop(&mut self) {
        // pre-drain the iter
        for _ in &mut self.iter {}
    }
 }
 /// Abort the process, we're out of memory!
 ///
 /// In practice this is probably dead code on most OSes
 fn oom() {
    ::std::process::exit(-1);
 }
 ```