pull/10/head
Alexis Beingessner 10 years ago committed by Manish Goregaokar
parent fad7507d4e
commit 0498ad1926

520
vec.md

@ -1,11 +1,11 @@
% Example: Implementing Vec % Example: Implementing Vec
TODO: audit for non-ZST offsets from heap::empty
To bring everything together, we're going to write `std::Vec` from scratch. To bring everything together, we're going to write `std::Vec` from scratch.
Because the all the best tools for writing unsafe code are unstable, this Because the all the best tools for writing unsafe code are unstable, this
project will only work on nightly (as of Rust 1.2.0). project will only work on nightly (as of Rust 1.2.0).
# Layout # Layout
First off, we need to come up with the struct layout. Naively we want this First off, we need to come up with the struct layout. Naively we want this
@ -63,16 +63,19 @@ as `std::rt::heap::EMPTY`. There are quite a few places where we'll want to use
`heap::EMPTY` because there's no real allocation to talk about but `null` would `heap::EMPTY` because there's no real allocation to talk about but `null` would
make the compiler angry. make the compiler angry.
All of the `heap` API is totally unstable under the `alloc` feature, though. All of the `heap` API is totally unstable under the `heap_api` feature, though.
We could trivially define `heap::EMPTY` ourselves, but we'll want the rest of We could trivially define `heap::EMPTY` ourselves, but we'll want the rest of
the `heap` API anyway, so let's just get that dependency over with. the `heap` API anyway, so let's just get that dependency over with.
# Allocating Memory # Allocating Memory
So: So:
```rust ```rust
#![feature(alloc)] #![feature(heap_api)]
use std::rt::heap::EMPTY; use std::rt::heap::EMPTY;
use std::mem; use std::mem;
@ -184,6 +187,10 @@ fn grow(&mut self) {
Nothing particularly tricky here. Just computing sizes and alignments and doing Nothing particularly tricky here. Just computing sizes and alignments and doing
some careful multiplication checks. some careful multiplication checks.
# Push and Pop # Push and Pop
Alright. We can initialize. We can allocate. Let's actually implement some Alright. We can initialize. We can allocate. Let's actually implement some
@ -240,6 +247,10 @@ pub fn pop(&mut self) -> Option<T> {
} }
``` ```
# Deallocating # Deallocating
Next we should implement Drop so that we don't massively leaks tons of resources. Next we should implement Drop so that we don't massively leaks tons of resources.
@ -270,6 +281,10 @@ impl<T> Drop for Vec<T> {
} }
``` ```
# Deref # Deref
Alright! We've got a decent minimal ArrayStack implemented. We can push, we can Alright! We've got a decent minimal ArrayStack implemented. We can push, we can
@ -311,6 +326,10 @@ impl<T> DerefMut for Vec<T> {
Now we have `len`, `first`, `last`, indexing, slicing, sorting, `iter`, `iter_mut`, Now we have `len`, `first`, `last`, indexing, slicing, sorting, `iter`, `iter_mut`,
and all other sorts of bells and whistles provided by slice. Sweet! and all other sorts of bells and whistles provided by slice. Sweet!
# Insert and Remove # Insert and Remove
Something *not* provided but slice is `insert` and `remove`, so let's do those next. Something *not* provided but slice is `insert` and `remove`, so let's do those next.
@ -362,6 +381,10 @@ pub fn remove(&mut self, index: usize) -> T {
} }
``` ```
# IntoIter # IntoIter
Let's move on to writing iterators. `iter` and `iter_mut` have already been Let's move on to writing iterators. `iter` and `iter_mut` have already been
@ -410,7 +433,22 @@ struct IntoIter<T> {
} }
``` ```
And initialize it like this: One last subtle detail: if our Vec is empty, we want to produce an empty iterator.
This will actually technically fall out doing the naive thing of:
```text
start = ptr
end = ptr.offset(len)
```
However because `offset` is marked as a GEP inbounds instruction, this will tell
llVM that ptr is allocated and won't alias other allocated memory. This is fine
for zero-sized types, as they can't alias anything. However if we're using
heap::EMPTY as a sentinel for a non-allocation for a *non-zero-sized* type,
this can cause undefined behaviour. Alas, we must therefore special case either
cap or len being 0 to not do the offset.
So this is what we end up with for initialization:
```rust ```rust
impl<T> Vec<T> { impl<T> Vec<T> {
@ -428,7 +466,12 @@ impl<T> Vec<T> {
buf: ptr, buf: ptr,
cap: cap, cap: cap,
start: *ptr, start: *ptr,
end: ptr.offset(len as isize), end: if cap == 0 {
// can't offset off this pointer, it's not allocated!
*ptr
} else {
ptr.offset(len as isize)
}
} }
} }
} }
@ -635,6 +678,10 @@ impl<T> Vec<T> {
Much better. Much better.
# Drain # Drain
Let's move on to Drain. Drain is largely the same as IntoIter, except that Let's move on to Drain. Drain is largely the same as IntoIter, except that
@ -674,7 +721,11 @@ impl<T> RawValIter<T> {
unsafe fn new(slice: &[T]) -> Self { unsafe fn new(slice: &[T]) -> Self {
RawValIter { RawValIter {
start: slice.as_ptr(), start: slice.as_ptr(),
end: slice.as_ptr().offset(slice.len() as isize), end: if slice.len() == 0 {
slice.as_ptr()
} else {
slice.as_ptr().offset(slice.len() as isize)
}
} }
} }
} }
@ -771,6 +822,8 @@ impl<T> Vec<T> {
``` ```
# Handling Zero-Sized Types # Handling Zero-Sized Types
It's time. We're going to fight the spectre that is zero-sized types. Safe Rust It's time. We're going to fight the spectre that is zero-sized types. Safe Rust
@ -781,13 +834,14 @@ zero-sized types. We need to be careful of two things:
* The raw allocator API has undefined behaviour if you pass in 0 for an * The raw allocator API has undefined behaviour if you pass in 0 for an
allocation size. allocation size.
* raw pointer offsets are no-ops for zero-sized types, which will break our * raw pointer offsets are no-ops for zero-sized types, which will break our
C-style pointer iterator C-style pointer iterator.
Thankfully we abstracted out pointer-iterators and allocating handling into Thankfully we abstracted out pointer-iterators and allocating handling into
RawValIter and RawVec respectively. How mysteriously convenient. RawValIter and RawVec respectively. How mysteriously convenient.
## Allocating Zero-Sized Types ## Allocating Zero-Sized Types
So if the allocator API doesn't support zero-sized allocations, what on earth So if the allocator API doesn't support zero-sized allocations, what on earth
@ -797,13 +851,457 @@ to be considered to store or load them. This actually extends to `ptr::read` and
`ptr::write`: they won't actually look at the pointer at all. As such we *never* need `ptr::write`: they won't actually look at the pointer at all. As such we *never* need
to change the pointer. to change the pointer.
TODO Note however that our previous reliance on running out of memory before overflow is
no longer valid with zero-sized types. We must explicitly guard against capacity
overflow for zero-sized types.
Due to our current architecture, all this means is writing 3 guards, one in each
method of RawVec.
```rust
impl<T> RawVec<T> {
fn new() -> Self {
unsafe {
// -1 is usize::MAX. This branch should be stripped at compile time.
let cap = if mem::size_of::<T>() == 0 { -1 } else { 0 };
// heap::EMPTY doubles as "unallocated" and "zero-sized allocation"
RawVec { ptr: Unique::new(heap::EMPTY as *mut T), cap: cap }
}
}
fn grow(&mut self) {
unsafe {
let elem_size = mem::size_of::<T>();
// since we set the capacity to usize::MAX when elem_size is
// 0, getting to here necessarily means the Vec is overfull.
assert!(elem_size != 0, "capacity overflow");
let align = mem::min_align_of::<T>();
let (new_cap, ptr) = if self.cap == 0 {
let ptr = heap::allocate(elem_size, align);
(1, ptr)
} else {
let new_cap = 2 * self.cap;
let ptr = heap::reallocate(*self.ptr as *mut _,
self.cap * elem_size,
new_cap * elem_size,
align);
(new_cap, ptr)
};
// If allocate or reallocate fail, we'll get `null` back
if ptr.is_null() { oom() }
self.ptr = Unique::new(ptr as *mut _);
self.cap = new_cap;
}
}
}
impl<T> Drop for RawVec<T> {
fn drop(&mut self) {
let elem_size = mem::size_of::<T>();
// don't free zero-sized allocations, as they were never allocated.
if self.cap != 0 && elem_size != 0 {
let align = mem::min_align_of::<T>();
let num_bytes = elem_size * self.cap;
unsafe {
heap::deallocate(*self.ptr as *mut _, num_bytes, align);
}
}
}
}
```
That's it. We support pushing and popping zero-sized types now. Our iterators
(that aren't provided by slice Deref) are still busted, though.
## Iterating Zero-Sized Types ## Iterating Zero-Sized Types
TODO Zero-sized offsets are no-ops. This means that our current design will always
initialize `start` and `end` as the same value, and our iterators will yield
nothing. The current solution to this is to cast the pointers to integers,
increment, and then cast them back:
```
impl<T> RawValIter<T> {
unsafe fn new(slice: &[T]) -> Self {
RawValIter {
start: slice.as_ptr(),
end: if mem::size_of::<T>() == 0 {
((slice.as_ptr() as usize) + slice.len()) as *const _
} else if slice.len() == 0 {
slice.as_ptr()
} else {
slice.as_ptr().offset(slice.len() as isize)
}
}
}
}
```
Now we have a different bug. Instead of our iterators not running at all, our
iterators now run *forever*. We need to do the same trick in our iterator impls:
## Advanced Drain ```
impl<T> Iterator for RawValIter<T> {
type Item = T;
fn next(&mut self) -> Option<T> {
if self.start == self.end {
None
} else {
unsafe {
let result = ptr::read(self.start);
self.start = if mem::size_of::<T>() == 0 {
(self.start as usize + 1) as *const _
} else {
self.start.offset(1);
}
Some(result)
}
}
}
fn size_hint(&self) -> (usize, Option<usize>) {
let len = self.end as usize - self.start as usize;
(len, Some(len))
}
}
impl<T> DoubleEndedIterator for RawValIter<T> {
fn next_back(&mut self) -> Option<T> {
if self.start == self.end {
None
} else {
unsafe {
self.end = if mem::size_of::<T>() == 0 {
(self.end as usize - 1) as *const _
} else {
self.end.offset(-1);
}
Some(ptr::read(self.end))
}
}
}
}
```
And that's it. Iteration works!
# Advanced Drain
TODO? Not clear if informative TODO? Not clear if informative
# The Final Code
```rust
#![feature(unique)]
#![feature(heap_api)]
use std::ptr::{Unique, self};
use std::rt::heap;
use std::mem;
use std::ops::{Deref, DerefMut};
use std::marker::PhantomData;
struct RawVec<T> {
ptr: Unique<T>,
cap: usize,
}
impl<T> RawVec<T> {
fn new() -> Self {
unsafe {
// -1 is usize::MAX. This branch should be stripped at compile time.
let cap = if mem::size_of::<T>() == 0 { -1 } else { 0 };
// heap::EMPTY doubles as "unallocated" and "zero-sized allocation"
RawVec { ptr: Unique::new(heap::EMPTY as *mut T), cap: cap }
}
}
fn grow(&mut self) {
unsafe {
let elem_size = mem::size_of::<T>();
// since we set the capacity to usize::MAX when elem_size is
// 0, getting to here necessarily means the Vec is overfull.
assert!(elem_size != 0, "capacity overflow");
let align = mem::min_align_of::<T>();
let (new_cap, ptr) = if self.cap == 0 {
let ptr = heap::allocate(elem_size, align);
(1, ptr)
} else {
let new_cap = 2 * self.cap;
let ptr = heap::reallocate(*self.ptr as *mut _,
self.cap * elem_size,
new_cap * elem_size,
align);
(new_cap, ptr)
};
// If allocate or reallocate fail, we'll get `null` back
if ptr.is_null() { oom() }
self.ptr = Unique::new(ptr as *mut _);
self.cap = new_cap;
}
}
}
impl<T> Drop for RawVec<T> {
fn drop(&mut self) {
let elem_size = mem::size_of::<T>();
if self.cap != 0 && elem_size != 0 {
let align = mem::min_align_of::<T>();
let num_bytes = elem_size * self.cap;
unsafe {
heap::deallocate(*self.ptr as *mut _, num_bytes, align);
}
}
}
}
pub struct Vec<T> {
buf: RawVec<T>,
len: usize,
}
impl<T> Vec<T> {
fn ptr(&self) -> *mut T { *self.buf.ptr }
fn cap(&self) -> usize { self.buf.cap }
pub fn new() -> Self {
Vec { buf: RawVec::new(), len: 0 }
}
pub fn push(&mut self, elem: T) {
if self.len == self.cap() { self.buf.grow(); }
unsafe {
ptr::write(self.ptr().offset(self.len as isize), elem);
}
// Can't fail, we'll OOM first.
self.len += 1;
}
pub fn pop(&mut self) -> Option<T> {
if self.len == 0 {
None
} else {
self.len -= 1;
unsafe {
Some(ptr::read(self.ptr().offset(self.len as isize)))
}
}
}
pub fn insert(&mut self, index: usize, elem: T) {
assert!(index <= self.len, "index out of bounds");
if self.cap() == self.len { self.buf.grow(); }
unsafe {
if index < self.len {
ptr::copy(self.ptr().offset(index as isize),
self.ptr().offset(index as isize + 1),
self.len - index);
}
ptr::write(self.ptr().offset(index as isize), elem);
self.len += 1;
}
}
pub fn remove(&mut self, index: usize) -> T {
assert!(index < self.len, "index out of bounds");
unsafe {
self.len -= 1;
let result = ptr::read(self.ptr().offset(index as isize));
ptr::copy(self.ptr().offset(index as isize + 1),
self.ptr().offset(index as isize),
self.len - index);
result
}
}
pub fn into_iter(self) -> IntoIter<T> {
unsafe {
let iter = RawValIter::new(&self);
let buf = ptr::read(&self.buf);
mem::forget(self);
IntoIter {
iter: iter,
_buf: buf,
}
}
}
pub fn drain(&mut self) -> Drain<T> {
// this is a mem::forget safety thing. If this is forgotten, we just
// leak the whole Vec's contents. Also we need to do this *eventually*
// anyway, so why not do it now?
self.len = 0;
unsafe {
Drain {
iter: RawValIter::new(&self),
vec: PhantomData,
}
}
}
}
impl<T> Drop for Vec<T> {
fn drop(&mut self) {
while let Some(_) = self.pop() {}
// allocation is handled by RawVec
}
}
impl<T> Deref for Vec<T> {
type Target = [T];
fn deref(&self) -> &[T] {
unsafe {
::std::slice::from_raw_parts(self.ptr(), self.len)
}
}
}
impl<T> DerefMut for Vec<T> {
fn deref_mut(&mut self) -> &mut [T] {
unsafe {
::std::slice::from_raw_parts_mut(self.ptr(), self.len)
}
}
}
struct RawValIter<T> {
start: *const T,
end: *const T,
}
impl<T> RawValIter<T> {
unsafe fn new(slice: &[T]) -> Self {
RawValIter {
start: slice.as_ptr(),
end: if mem::size_of::<T>() == 0 {
((slice.as_ptr() as usize) + slice.len()) as *const _
} else if slice.len() == 0 {
slice.as_ptr()
} else {
slice.as_ptr().offset(slice.len() as isize)
}
}
}
}
impl<T> Iterator for RawValIter<T> {
type Item = T;
fn next(&mut self) -> Option<T> {
if self.start == self.end {
None
} else {
unsafe {
let result = ptr::read(self.start);
self.start = self.start.offset(1);
Some(result)
}
}
}
fn size_hint(&self) -> (usize, Option<usize>) {
let len = self.end as usize - self.start as usize;
(len, Some(len))
}
}
impl<T> DoubleEndedIterator for RawValIter<T> {
fn next_back(&mut self) -> Option<T> {
if self.start == self.end {
None
} else {
unsafe {
self.end = self.end.offset(-1);
Some(ptr::read(self.end))
}
}
}
}
pub struct IntoIter<T> {
_buf: RawVec<T>, // we don't actually care about this. Just need it to live.
iter: RawValIter<T>,
}
impl<T> Iterator for IntoIter<T> {
type Item = T;
fn next(&mut self) -> Option<T> { self.iter.next() }
fn size_hint(&self) -> (usize, Option<usize>) { self.iter.size_hint() }
}
impl<T> DoubleEndedIterator for IntoIter<T> {
fn next_back(&mut self) -> Option<T> { self.iter.next_back() }
}
impl<T> Drop for IntoIter<T> {
fn drop(&mut self) {
for _ in &mut *self {}
}
}
pub struct Drain<'a, T: 'a> {
vec: PhantomData<&'a mut Vec<T>>,
iter: RawValIter<T>,
}
impl<'a, T> Iterator for Drain<'a, T> {
type Item = T;
fn next(&mut self) -> Option<T> { self.iter.next_back() }
fn size_hint(&self) -> (usize, Option<usize>) { self.iter.size_hint() }
}
impl<'a, T> DoubleEndedIterator for Drain<'a, T> {
fn next_back(&mut self) -> Option<T> { self.iter.next_back() }
}
impl<'a, T> Drop for Drain<'a, T> {
fn drop(&mut self) {
// pre-drain the iter
for _ in &mut self.iter {}
}
}
/// Abort the process, we're out of memory!
///
/// In practice this is probably dead code on most OSes
fn oom() {
::std::process::exit(-1);
}
```

Loading…
Cancel
Save