many many pnkfelix fixes

pull/10/head
Alexis Beingessner 9 years ago committed by Manish Goregaokar
parent 35f68b4107
commit fadf50dc7d

@ -9,19 +9,24 @@ is not always the case, however.
# Dynamically Sized Types (DSTs)
Rust also supports types without a statically known size. On the surface, this
is a bit nonsensical: Rust *must* know the size of something in order to work
with it! DSTs are generally produced as views, or through type-erasure of types
that *do* have a known size. Due to their lack of a statically known size, these
types can only exist *behind* some kind of pointer. They consequently produce a
*fat* pointer consisting of the pointer and the information that *completes*
them.
For instance, the slice type, `[T]`, is some statically unknown number of
elements stored contiguously. `&[T]` consequently consists of a `(&T, usize)`
pair that specifies where the slice starts, and how many elements it contains.
Similarly, Trait Objects support interface-oriented type erasure through a
`(data_ptr, vtable_ptr)` pair.
Rust in fact supports Dynamically Sized Types (DSTs): types without a statically
known size or alignment. On the surface, this is a bit nonsensical: Rust *must*
know the size and alignment of something in order to correctly work with it! In
this regard, DSTs are not normal types. Due to their lack of a statically known
size, these types can only exist behind some kind of pointer. Any pointer to a
DST consequently becomes a *fat* pointer consisting of the pointer and the
information that "completes" them (more on this below).
There are two major DSTs exposed by the language: trait objects, and slices.
A trait object represents some type that implements the traits it specifies.
The exact original type is *erased* in favour of runtime reflection
with a vtable containing all the information necessary to use the type.
This is the information that completes a trait object: a pointer to its vtable.
A slice is simply a view into some contiguous storage -- typically an array or
`Vec`. The information that completes a slice is just the number of elements
it points to.
Structs can actually store a single DST directly as their last field, but this
makes them a DST as well:
@ -34,8 +39,8 @@ struct Foo {
}
```
**NOTE: As of Rust 1.0 struct DSTs are broken if the last field has
a variable position based on its alignment.**
**NOTE: [As of Rust 1.0 struct DSTs are broken if the last field has
a variable position based on its alignment][dst-issue].**
@ -56,22 +61,32 @@ struct Baz {
}
```
On their own, ZSTs are, for obvious reasons, pretty useless. However as with
many curious layout choices in Rust, their potential is realized in a generic
context.
Rust largely understands that any operation that produces or stores a ZST can be
reduced to a no-op. For instance, a `HashSet<T>` can be effeciently implemented
as a thin wrapper around `HashMap<T, ()>` because all the operations `HashMap`
normally does to store and retrieve values will be completely stripped in
monomorphization.
Similarly `Result<(), ()>` and `Option<()>` are effectively just fancy `bool`s.
On their own, Zero Sized Types (ZSTs) are, for obvious reasons, pretty useless.
However as with many curious layout choices in Rust, their potential is realized
in a generic context: Rust largely understands that any operation that produces
or stores a ZST can be reduced to a no-op. First off, storing it doesn't even
make sense -- it doesn't occupy any space. Also there's only one value of that
type, so anything that loads it can just produce it from the aether -- which is
also a no-op since it doesn't occupy any space.
One of the most extreme example's of this is Sets and Maps. Given a
`Map<Key, Value>`, it is common to implement a `Set<Key>` as just a thin wrapper
around `Map<Key, UselessJunk>`. In many languages, this would necessitate
allocating space for UselessJunk and doing work to store and load UselessJunk
only to discard it. Proving this unnecessary would be a difficult analysis for
the compiler.
However in Rust, we can just say that `Set<Key> = Map<Key, ()>`. Now Rust
statically knows that every load and store is useless, and no allocation has any
size. The result is that the monomorphized code is basically a custom
implementation of a HashSet with none of the overhead that HashMap would have to
support values.
Safe code need not worry about ZSTs, but *unsafe* code must be careful about the
consequence of types with no size. In particular, pointer offsets are no-ops,
and standard allocators (including jemalloc, the one used by Rust) generally
consider passing in `0` as Undefined Behaviour.
and standard allocators (including jemalloc, the one used by default in Rust)
generally consider passing in `0` for the size of an allocation as Undefined
Behaviour.
@ -93,11 +108,12 @@ return a Result in general, but a specific case actually is infallible. It's
actually possible to communicate this at the type level by returning a
`Result<T, Void>`. Consumers of the API can confidently unwrap such a Result
knowing that it's *statically impossible* for this value to be an `Err`, as
this would require providing a value of type Void.
this would require providing a value of type `Void`.
In principle, Rust can do some interesting analyses and optimizations based
on this fact. For instance, `Result<T, Void>` could be represented as just `T`,
because the Err case doesn't actually exist. The following *could* also compile:
because the `Err` case doesn't actually exist. The following *could* also
compile:
```rust,ignore
enum Void {}
@ -116,3 +132,6 @@ actually valid to construct, but dereferencing them is Undefined Behaviour
because that doesn't actually make sense. That is, you could model C's `void *`
type with `*const Void`, but this doesn't necessarily gain anything over using
e.g. `*const ()`, which *is* safe to randomly dereference.
[dst-issue]: https://github.com/rust-lang/rust/issues/26403

@ -2,7 +2,7 @@
Programmers in safe "high-level" languages face a fundamental dilemma. On one
hand, it would be *really* great to just say what you want and not worry about
how it's done. On the other hand, that can lead to some *really* poor
how it's done. On the other hand, that can lead to unacceptably poor
performance. It may be necessary to drop down to less clear or idiomatic
practices to get the performance characteristics you want. Or maybe you just
throw up your hands in disgust and decide to shell out to an implementation in
@ -12,21 +12,22 @@ Worse, when you want to talk directly to the operating system, you *have* to
talk to an unsafe language: *C*. C is ever-present and unavoidable. It's the
lingua-franca of the programming world.
Even other safe languages generally expose C interfaces for the world at large!
Regardless of *why* you're doing it, as soon as your program starts talking to
Regardless of why you're doing it, as soon as your program starts talking to
C it stops being safe.
With that said, Rust is *totally* a safe programming language.
Well, Rust *has* a safe programming language. Let's step back a bit.
Rust can be thought of as being composed of two
programming languages: *Safe* and *Unsafe*. Safe is For Reals Totally Safe.
Unsafe, unsurprisingly, is *not* For Reals Totally Safe. In fact, Unsafe lets
you do some really crazy unsafe things.
Rust can be thought of as being composed of two programming languages: *Safe
Rust* and *Unsafe Rust*. Safe Rust is For Reals Totally Safe. Unsafe Rust,
unsurprisingly, is *not* For Reals Totally Safe. In fact, Unsafe Rust lets you
do some really crazy unsafe things.
Safe is *the* Rust programming language. If all you do is write Safe Rust,
you will never have to worry about type-safety or memory-safety. You will never
endure a null or dangling pointer, or any of that Undefined Behaviour nonsense.
Safe Rust is the *true* Rust programming language. If all you do is write Safe
Rust, you will never have to worry about type-safety or memory-safety. You will
never endure a null or dangling pointer, or any of that Undefined Behaviour
nonsense.
*That's totally awesome*.
@ -69,17 +70,16 @@ language cares about is preventing the following things:
* A non-utf8 `str`
* Unwinding into another language
* Causing a [data race][race]
* Double-dropping a value
That's it. That's all the Undefined Behaviour baked into Rust. Of course, unsafe
functions and traits are free to declare arbitrary other constraints that a
program must maintain to avoid Undefined Behaviour. However these are generally
just things that will transitively lead to one of the above problems. Some
additional constraints may also derive from compiler intrinsics that make special
assumptions about how code can be optimized.
That's it. That's all the causes of Undefined Behaviour baked into Rust. Of
course, unsafe functions and traits are free to declare arbitrary other
constraints that a program must maintain to avoid Undefined Behaviour. However,
generally violations of these constraints will just transitively lead to one of
the above problems. Some additional constraints may also derive from compiler
intrinsics that make special assumptions about how code can be optimized.
Rust is otherwise quite permissive with respect to other dubious operations. Rust
considers it "safe" to:
Rust is otherwise quite permissive with respect to other dubious operations.
Rust considers it "safe" to:
* Deadlock
* Have a [race condition][race]

@ -12,21 +12,21 @@ The order, size, and alignment of fields is exactly what you would expect from C
or C++. Any type you expect to pass through an FFI boundary should have
`repr(C)`, as C is the lingua-franca of the programming world. This is also
necessary to soundly do more elaborate tricks with data layout such as
reintepretting values as a different type.
reinterpreting values as a different type.
However, the interaction with Rust's more exotic data layout features must be
kept in mind. Due to its dual purpose as "for FFI" and "for layout control",
`repr(C)` can be applied to types that will be nonsensical or problematic if
passed through the FFI boundary.
* ZSTs are still zero-sized, even though this is not a standard behaviour in
* ZSTs are still zero-sized, even though this is not a standard behaviour in
C, and is explicitly contrary to the behaviour of an empty type in C++, which
still consumes a byte of space.
* DSTs, tuples, and tagged unions are not a concept in C and as such are never
FFI safe.
* **The [drop flag][] will still be added**
* **If the type would have any [drop flags][], they will still be added**
* This is equivalent to one of `repr(u*)` (see the next section) for enums. The
chosen size is the default enum size for the target platform's C ABI. Note that
@ -39,10 +39,10 @@ compiled with certain flags.
# repr(u8), repr(u16), repr(u32), repr(u64)
These specify the size to make a C-like enum. If the discriminant overflows the
integer it has to fit in, it will be an error. You can manually ask Rust to
allow this by setting the overflowing element to explicitly be 0. However Rust
will not allow you to create an enum where two variants have the same
discriminant.
integer it has to fit in, it will produce a compile-time error. You can manually
ask Rust to allow this by setting the overflowing element to explicitly be 0.
However Rust will not allow you to create an enum where two variants have the
same discriminant.
On non-C-like enums, this will inhibit certain optimizations like the null-
pointer optimization.
@ -65,9 +65,12 @@ compiler might be able to paper over alignment issues with shifts and masks.
However if you take a reference to a packed field, it's unlikely that the
compiler will be able to emit code to avoid an unaligned load.
**[As of Rust 1.0 this can cause undefined behaviour.][ub loads]**
`repr(packed)` is not to be used lightly. Unless you have extreme requirements,
this should not be used.
This repr is a modifier on `repr(C)` and `repr(rust)`.
[drop flag]: drop-flags.html
[drop flags]: drop-flags.html
[ub loads]: https://github.com/rust-lang/rust/issues/27060

@ -5,16 +5,17 @@ memory-safe and efficient, while avoiding garbage collection. Before getting
into the ownership system in detail, we will consider the motivation of this
design.
We will assume that you accept that garbage collection is not always an optimal
solution, and that it is desirable to manually manage memory to some extent.
If you do not accept this, might I interest you in a different language?
We will assume that you accept that garbage collection (GC) is not always an
optimal solution, and that it is desirable to manually manage memory in some
contexts. If you do not accept this, might I interest you in a different
language?
Regardless of your feelings on GC, it is pretty clearly a *massive* boon to
making code safe. You never have to worry about things going away *too soon*
(although whether you still *wanted* to be pointing at that thing is a different
issue...). This is a pervasive problem that C and C++ need to deal with.
Consider this simple mistake that all of us who have used a non-GC'd language
have made at one point:
issue...). This is a pervasive problem that C and C++ programs need to deal
with. Consider this simple mistake that all of us who have used a non-GC'd
language have made at one point:
```rust,ignore
fn as_str(data: &u32) -> &str {
@ -40,7 +41,7 @@ be forced to accept your program on the assumption that it is correct.
This will never happen to Rust. It's up to the programmer to prove to the
compiler that everything is sound.
Of course, rust's story around ownership is much more complicated than just
Of course, Rust's story around ownership is much more complicated than just
verifying that references don't escape the scope of their referent. That's
because ensuring pointers are always valid is much more complicated than this.
For instance in this code,

@ -1,5 +1,19 @@
% repr(Rust)
First and foremost, all types have an alignment specified in bytes. The
alignment of a type specifies what addresses are valid to store the value at. A
value of alignment `n` must only be stored at an address that is a multiple of
`n`. So alignment 2 means you must be stored at an even address, and 1 means
that you can be stored anywhere. Alignment is at least 1, and always a power of
2. Most primitives are generally aligned to their size, although this is
platform-specific behaviour. In particular, on x86 `u64` and `f64` may be only
aligned to 32 bits.
A type's size must always be a multiple of its alignment. This ensures that an
array of that type may always be indexed by offsetting by a multiple of its
size. Note that the size and alignment of a type may not be known
statically in the case of [dynamically sized types][dst].
Rust gives you the following ways to lay out composite data:
* structs (named product types)
@ -9,17 +23,10 @@ Rust gives you the following ways to lay out composite data:
An enum is said to be *C-like* if none of its variants have associated data.
For all these, individual fields are aligned to their preferred alignment. For
primitives this is usually equal to their size. For instance, a u32 will be
aligned to a multiple of 32 bits, and a u16 will be aligned to a multiple of 16
bits. Note that some primitives may be emulated on different platforms, and as
such may have strange alignment. For instance, a u64 on x86 may actually be
emulated as a pair of u32s, and thus only have 32-bit alignment.
Composite structures will have a preferred alignment equal to the maximum
of their fields' preferred alignment, and a size equal to a multiple of their
preferred alignment. This ensures that arrays of T can be correctly iterated
by offsetting by their size. So for instance,
Composite structures will have an alignment equal to the maximum
of their fields' alignment. Rust will consequently insert padding where
necessary to ensure that all fields are properly aligned and that the overall
type's size is a multiple of its alignment. For instance:
```rust
struct A {
@ -29,12 +36,24 @@ struct A {
}
```
will have a size that is a multiple of 32-bits, and 32-bit alignment.
will be 32-bit aligned assuming these primitives are aligned to their size.
It will therefore have a size that is a multiple of 32-bits. It will potentially
*really* become:
There is *no indirection* for these types; all data is stored contiguously as you would
expect in C. However with the exception of arrays (which are densely packed and
in-order), the layout of data is not by default specified in Rust. Given the two
following struct definitions:
```rust
struct A {
a: u8,
_pad1: [u8; 3], // to align `b`
b: u32,
c: u16,
_pad2: [u8; 2], // to make overall size multiple of 4
}
```
There is *no indirection* for these types; all data is stored contiguously as
you would expect in C. However with the exception of arrays (which are densely
packed and in-order), the layout of data is not by default specified in Rust.
Given the two following struct definitions:
```rust
struct A {
@ -48,13 +67,15 @@ struct B {
}
```
Rust *does* guarantee that two instances of A have their data laid out in exactly
the same way. However Rust *does not* guarantee that an instance of A has the same
field ordering or padding as an instance of B (in practice there's no *particular*
reason why they wouldn't, other than that its not currently guaranteed).
Rust *does* guarantee that two instances of A have their data laid out in
exactly the same way. However Rust *does not* guarantee that an instance of A
has the same field ordering or padding as an instance of B (in practice there's
no *particular* reason why they wouldn't, other than that its not currently
guaranteed).
With A and B as written, this is basically nonsensical, but several other features
of Rust make it desirable for the language to play with data layout in complex ways.
With A and B as written, this is basically nonsensical, but several other
features of Rust make it desirable for the language to play with data layout in
complex ways.
For instance, consider this struct:
@ -66,10 +87,10 @@ struct Foo<T, U> {
}
```
Now consider the monomorphizations of `Foo<u32, u16>` and `Foo<u16, u32>`. If Rust lays out the
fields in the order specified, we expect it to *pad* the values in the struct to satisfy
their *alignment* requirements. So if Rust didn't reorder fields, we would expect Rust to
produce the following:
Now consider the monomorphizations of `Foo<u32, u16>` and `Foo<u16, u32>`. If
Rust lays out the fields in the order specified, we expect it to *pad* the
values in the struct to satisfy their *alignment* requirements. So if Rust
didn't reorder fields, we would expect Rust to produce the following:
```rust,ignore
struct Foo<u16, u32> {
@ -87,10 +108,11 @@ struct Foo<u32, u16> {
}
```
The latter case quite simply wastes space. An optimal use of space therefore requires
different monomorphizations to have *different field orderings*.
The latter case quite simply wastes space. An optimal use of space therefore
requires different monomorphizations to have *different field orderings*.
**Note: this is a hypothetical optimization that is not yet implemented in Rust 1.0**
**Note: this is a hypothetical optimization that is not yet implemented in Rust
**1.0
Enums make this consideration even more complicated. Naively, an enum such as:
@ -121,8 +143,10 @@ by using null as a special value. The net result is that
There are many types in Rust that are, or contain, "not null" pointers such as
`Box<T>`, `Vec<T>`, `String`, `&T`, and `&mut T`. Similarly, one can imagine
nested enums pooling their tags into a single descriminant, as they are by
nested enums pooling their tags into a single discriminant, as they are by
definition known to have a limited range of valid values. In principle enums can
use fairly elaborate algorithms to cache bits throughout nested types with
special constrained representations. As such it is *especially* desirable that
we leave enum layout unspecified today.
[dst]: exotic-sizes.html#dynamically-sized-types-(dsts)

@ -1,29 +1,30 @@
% How Safe and Unsafe Interact
So what's the relationship between Safe and Unsafe? How do they interact?
So what's the relationship between Safe and Unsafe Rust? How do they interact?
Rust models the seperation between Safe and Unsafe with the `unsafe` keyword, which
can be thought as a sort of *foreign function interface* (FFI) between Safe and Unsafe.
This is the magic behind why we can say Safe is a safe language: all the scary unsafe
bits are relagated *exclusively* to FFI *just like every other safe language*.
Rust models the separation between Safe and Unsafe Rust with the `unsafe`
keyword, which can be thought as a sort of *foreign function interface* (FFI)
between Safe and Unsafe Rust. This is the magic behind why we can say Safe Rust
is a safe language: all the scary unsafe bits are relegated *exclusively* to FFI
*just like every other safe language*.
However because one language is a subset of the other, the two can be cleanly
intermixed as long as the boundary between Safe and Unsafe is denoted with the
`unsafe` keyword. No need to write headers, initialize runtimes, or any of that
other FFI boiler-plate.
intermixed as long as the boundary between Safe and Unsafe Rust is denoted with
the `unsafe` keyword. No need to write headers, initialize runtimes, or any of
that other FFI boiler-plate.
There are several places `unsafe` can appear in Rust today, which can largely be
grouped into two categories:
* There are unchecked contracts here. To declare you understand this, I require
you to write `unsafe` elsewhere:
* On functions, `unsafe` is declaring the function to be unsafe to call. Users
of the function must check the documentation to determine what this means,
and then have to write `unsafe` somewhere to identify that they're aware of
the danger.
* On functions, `unsafe` is declaring the function to be unsafe to call.
Users of the function must check the documentation to determine what this
means, and then have to write `unsafe` somewhere to identify that they're
aware of the danger.
* On trait declarations, `unsafe` is declaring that *implementing* the trait
is an unsafe operation, as it has contracts that other unsafe code is free to
trust blindly. (More on this below.)
is an unsafe operation, as it has contracts that other unsafe code is free
to trust blindly. (More on this below.)
* I am declaring that I have, to the best of my knowledge, adhered to the
unchecked contracts:
@ -64,9 +65,9 @@ This means that Unsafe, **the royal vanguard of Undefined Behaviour**, has to be
*super paranoid* about generic safe code. Unsafe is free to trust *specific* safe
code (or else you would degenerate into infinite spirals of paranoid despair).
It is generally regarded as ok to trust the standard library to be correct, as
std is effectively an extension of the language (and you *really* just have to trust
the language). If `std` fails to uphold the guarantees it declares, then it's
basically a language bug.
`std` is effectively an extension of the language (and you *really* just have
to trust the language). If `std` fails to uphold the guarantees it declares,
then it's basically a language bug.
That said, it would be best to minimize *needlessly* relying on properties of
concrete safe code. Bugs happen! Of course, I must reinforce that this is only
@ -89,7 +90,7 @@ Ord for a type, but don't actually provide a proper total ordering, BTreeMap wil
get *really confused* and start making a total mess of itself. Data that is
inserted may be impossible to find!
But that's ok. BTreeMap is safe, so it guarantees that even if you give it a
But that's okay. BTreeMap is safe, so it guarantees that even if you give it a
*completely* garbage Ord implementation, it will still do something *safe*. You
won't start reading uninitialized memory or unallocated memory. In fact, BTreeMap
manages to not actually lose any of your data. When the map is dropped, all the
@ -104,7 +105,24 @@ Safe's responsibility to uphold.
But wouldn't it be grand if there was some way for Unsafe to trust *some* trait
contracts *somewhere*? This is the problem that unsafe traits tackle: by marking
*the trait itself* as unsafe *to implement*, Unsafe can trust the implementation
to be correct.
to uphold the trait's contract. Although the trait implementation may be
incorrect in arbitrary other ways.
For instance, given a hypothetical UnsafeOrd trait, this is technically a valid
implementation:
```rust
# use std::cmp::Ordering;
# struct MyType;
# pub unsafe trait UnsafeOrd { fn cmp(&self, other: &Self) -> Ordering; }
unsafe impl UnsafeOrd for MyType {
fn cmp(&self, other: &Self) -> Ordering {
Ordering::Equal
}
}
```
But it's probably not the implementation you want.
Rust has traditionally avoided making traits unsafe because it makes Unsafe
pervasive, which is not desirable. Send and Sync are unsafe is because

@ -1,8 +1,8 @@
% Working with Unsafe
Rust generally only gives us the tools to talk about Unsafe in a scoped and
binary manner. Unfortunately, reality is significantly more complicated than that.
For instance, consider the following toy function:
Rust generally only gives us the tools to talk about Unsafe Rust in a scoped and
binary manner. Unfortunately, reality is significantly more complicated than
that. For instance, consider the following toy function:
```rust
fn index(idx: usize, arr: &[u8]) -> Option<u8> {
@ -35,10 +35,15 @@ fn index(idx: usize, arr: &[u8]) -> Option<u8> {
This program is now unsound, and yet *we only modified safe code*. This is the
fundamental problem of safety: it's non-local. The soundness of our unsafe
operations necessarily depends on the state established by "safe" operations.
Although safety *is* modular (we *still* don't need to worry about about
unrelated safety issues like uninitialized memory), it quickly contaminates the
surrounding code.
operations necessarily depends on the state established by otherwise
"safe" operations.
Safety is modular in the sense that opting into unsafety doesn't require you
to consider arbitrary other kinds of badness. For instance, doing an unchecked
index into a slice doesn't mean you suddenly need to worry about the slice being
null or containing uninitialized memory. Nothing fundamentally changes. However
safety *isn't* modular in the sense that programs are inherently stateful and
your unsafe operations may depend on arbitrary other state.
Trickier than that is when we get into actual statefulness. Consider a simple
implementation of `Vec`:
@ -84,10 +89,10 @@ fn make_room(&mut self) {
}
```
This code is safe, but it is also completely unsound. Changing the capacity
violates the invariants of Vec (that `cap` reflects the allocated space in the
Vec). This is not something the rest of Vec can guard against. It *has* to
trust the capacity field because there's no way to verify it.
This code is 100% Safe Rust but it is also completely unsound. Changing the
capacity violates the invariants of Vec (that `cap` reflects the allocated space
in the Vec). This is not something the rest of Vec can guard against. It *has*
to trust the capacity field because there's no way to verify it.
`unsafe` does more than pollute a whole function: it pollutes a whole *module*.
Generally, the only bullet-proof way to limit the scope of unsafe code is at the
@ -102,9 +107,13 @@ as Vec.
It is therefore possible for us to write a completely safe abstraction that
relies on complex invariants. This is *critical* to the relationship between
Safe Rust and Unsafe Rust. We have already seen that Unsafe code must trust
*some* Safe code, but can't trust *arbitrary* Safe code. However if Unsafe
couldn't prevent client Safe code from messing with its state in arbitrary ways,
safety would be a lost cause.
*some* Safe code, but can't trust *generic* Safe code. It can't trust an
arbitrary implementor of a trait or any function that was passed to it to be
well-behaved in a way that safe code doesn't care about.
However if unsafe code couldn't prevent client safe code from messing with its
state in arbitrary ways, safety would be a lost cause. Thankfully, it *can*
prevent arbitrary code from messing with critical state due to privacy.
Safety lives!

Loading…
Cancel
Save