last of the emphasis cleanup

pull/10/head
Alexis Beingessner 9 years ago committed by Manish Goregaokar
parent 42582a28ed
commit 37d42cdcef

@ -12,11 +12,13 @@ it's impossible to alias a mutable reference, so it's impossible to perform a
data race. Interior mutability makes this more complicated, which is largely why
we have the Send and Sync traits (see below).
However Rust *does not* prevent general race conditions. This is
pretty fundamentally impossible, and probably honestly undesirable. Your hardware
is racy, your OS is racy, the other programs on your computer are racy, and the
world this all runs in is racy. Any system that could genuinely claim to prevent
*all* race conditions would be pretty awful to use, if not just incorrect.
**However Rust does not prevent general race conditions.**
This is pretty fundamentally impossible, and probably honestly undesirable. Your
hardware is racy, your OS is racy, the other programs on your computer are racy,
and the world this all runs in is racy. Any system that could genuinely claim to
prevent *all* race conditions would be pretty awful to use, if not just
incorrect.
So it's perfectly "fine" for a Safe Rust program to get deadlocked or do
something incredibly stupid with incorrect synchronization. Obviously such a
@ -46,7 +48,7 @@ thread::spawn(move || {
});
// Index with the value loaded from the atomic. This is safe because we
// read the atomic memory only once, and then pass a *copy* of that value
// read the atomic memory only once, and then pass a copy of that value
// to the Vec's indexing implementation. This indexing will be correctly
// bounds checked, and there's no chance of the value getting changed
// in the middle. However our program may panic if the thread we spawned
@ -75,7 +77,7 @@ thread::spawn(move || {
if idx.load(Ordering::SeqCst) < data.len() {
unsafe {
// Incorrectly loading the idx *after* we did the bounds check.
// Incorrectly loading the idx after we did the bounds check.
// It could have changed. This is a race condition, *and dangerous*
// because we decided to do `get_unchecked`, which is `unsafe`.
println!("{}", data.get_unchecked(idx.load(Ordering::SeqCst)));

@ -70,7 +70,7 @@ struct B {
Rust *does* guarantee that two instances of A have their data laid out in
exactly the same way. However Rust *does not* guarantee that an instance of A
has the same field ordering or padding as an instance of B (in practice there's
no *particular* reason why they wouldn't, other than that its not currently
no particular reason why they wouldn't, other than that its not currently
guaranteed).
With A and B as written, this is basically nonsensical, but several other
@ -88,9 +88,9 @@ struct Foo<T, U> {
```
Now consider the monomorphizations of `Foo<u32, u16>` and `Foo<u16, u32>`. If
Rust lays out the fields in the order specified, we expect it to *pad* the
values in the struct to satisfy their *alignment* requirements. So if Rust
didn't reorder fields, we would expect Rust to produce the following:
Rust lays out the fields in the order specified, we expect it to pad the
values in the struct to satisfy their alignment requirements. So if Rust
didn't reorder fields, we would expect it to produce the following:
```rust,ignore
struct Foo<u16, u32> {
@ -112,7 +112,7 @@ The latter case quite simply wastes space. An optimal use of space therefore
requires different monomorphizations to have *different field orderings*.
**Note: this is a hypothetical optimization that is not yet implemented in Rust
**1.0
1.0**
Enums make this consideration even more complicated. Naively, an enum such as:
@ -128,7 +128,7 @@ would be laid out as:
```rust
struct FooRepr {
data: u64, // this is *really* either a u64, u32, or u8 based on `tag`
data: u64, // this is either a u64, u32, or u8 based on `tag`
tag: u8, // 0 = A, 1 = B, 2 = C
}
```

@ -5,7 +5,7 @@ So what's the relationship between Safe and Unsafe Rust? How do they interact?
Rust models the separation between Safe and Unsafe Rust with the `unsafe`
keyword, which can be thought as a sort of *foreign function interface* (FFI)
between Safe and Unsafe Rust. This is the magic behind why we can say Safe Rust
is a safe language: all the scary unsafe bits are relegated *exclusively* to FFI
is a safe language: all the scary unsafe bits are relegated exclusively to FFI
*just like every other safe language*.
However because one language is a subset of the other, the two can be cleanly
@ -61,13 +61,13 @@ The need for unsafe traits boils down to the fundamental property of safe code:
**No matter how completely awful Safe code is, it can't cause Undefined
Behaviour.**
This means that Unsafe, **the royal vanguard of Undefined Behaviour**, has to be
*super paranoid* about generic safe code. Unsafe is free to trust *specific* safe
code (or else you would degenerate into infinite spirals of paranoid despair).
It is generally regarded as ok to trust the standard library to be correct, as
`std` is effectively an extension of the language (and you *really* just have
to trust the language). If `std` fails to uphold the guarantees it declares,
then it's basically a language bug.
This means that Unsafe Rust, **the royal vanguard of Undefined Behaviour**, has to be
*super paranoid* about generic safe code. To be clear, Unsafe Rust is totally free to trust
specific safe code. Anything else would degenerate into infinite spirals of
paranoid despair. In particular it's generally regarded as ok to trust the standard library
to be correct. `std` is effectively an extension of the language, and you
really just have to trust the language. If `std` fails to uphold the
guarantees it declares, then it's basically a language bug.
That said, it would be best to minimize *needlessly* relying on properties of
concrete safe code. Bugs happen! Of course, I must reinforce that this is only
@ -75,36 +75,36 @@ a concern for Unsafe code. Safe code can blindly trust anyone and everyone
as far as basic memory-safety is concerned.
On the other hand, safe traits are free to declare arbitrary contracts, but because
implementing them is Safe, Unsafe can't trust those contracts to actually
implementing them is safe, unsafe code can't trust those contracts to actually
be upheld. This is different from the concrete case because *anyone* can
randomly implement the interface. There is something fundamentally different
about trusting a *particular* piece of code to be correct, and trusting *all the
about trusting a particular piece of code to be correct, and trusting *all the
code that will ever be written* to be correct.
For instance Rust has `PartialOrd` and `Ord` traits to try to differentiate
between types which can "just" be compared, and those that actually implement a
*total* ordering. Pretty much every API that wants to work with data that can be
compared *really* wants Ord data. For instance, a sorted map like BTreeMap
total ordering. Pretty much every API that wants to work with data that can be
compared wants Ord data. For instance, a sorted map like BTreeMap
*doesn't even make sense* for partially ordered types. If you claim to implement
Ord for a type, but don't actually provide a proper total ordering, BTreeMap will
get *really confused* and start making a total mess of itself. Data that is
inserted may be impossible to find!
But that's okay. BTreeMap is safe, so it guarantees that even if you give it a
*completely* garbage Ord implementation, it will still do something *safe*. You
won't start reading uninitialized memory or unallocated memory. In fact, BTreeMap
completely garbage Ord implementation, it will still do something *safe*. You
won't start reading uninitialized or unallocated memory. In fact, BTreeMap
manages to not actually lose any of your data. When the map is dropped, all the
destructors will be successfully called! Hooray!
However BTreeMap is implemented using a modest spoonful of Unsafe (most collections
are). That means that it is not necessarily *trivially true* that a bad Ord
implementation will make BTreeMap behave safely. Unsafe must be sure not to rely
on Ord *where safety is at stake*. Ord is provided by Safe, and safety is not
Safe's responsibility to uphold.
However BTreeMap is implemented using a modest spoonful of Unsafe Rust (most collections
are). That means that it's not necessarily *trivially true* that a bad Ord
implementation will make BTreeMap behave safely. BTreeMap must be sure not to rely
on Ord *where safety is at stake*. Ord is provided by safe code, and safety is not
safe code's responsibility to uphold.
But wouldn't it be grand if there was some way for Unsafe to trust *some* trait
But wouldn't it be grand if there was some way for Unsafe to trust some trait
contracts *somewhere*? This is the problem that unsafe traits tackle: by marking
*the trait itself* as unsafe *to implement*, Unsafe can trust the implementation
*the trait itself* as unsafe to implement, unsafe code can trust the implementation
to uphold the trait's contract. Although the trait implementation may be
incorrect in arbitrary other ways.
@ -126,7 +126,7 @@ But it's probably not the implementation you want.
Rust has traditionally avoided making traits unsafe because it makes Unsafe
pervasive, which is not desirable. Send and Sync are unsafe is because thread
safety is a *fundamental property* that Unsafe cannot possibly hope to defend
safety is a *fundamental property* that unsafe code cannot possibly hope to defend
against in the same way it would defend against a bad Ord implementation. The
only way to possibly defend against thread-unsafety would be to *not use
threading at all*. Making every load and store atomic isn't even sufficient,
@ -135,10 +135,10 @@ in memory. For instance, the pointer and capacity of a Vec must be in sync.
Even concurrent paradigms that are traditionally regarded as Totally Safe like
message passing implicitly rely on some notion of thread safety -- are you
really message-passing if you pass a *pointer*? Send and Sync therefore require
some *fundamental* level of trust that Safe code can't provide, so they must be
really message-passing if you pass a pointer? Send and Sync therefore require
some fundamental level of trust that Safe code can't provide, so they must be
unsafe to implement. To help obviate the pervasive unsafety that this would
introduce, Send (resp. Sync) is *automatically* derived for all types composed only
introduce, Send (resp. Sync) is automatically derived for all types composed only
of Send (resp. Sync) values. 99% of types are Send and Sync, and 99% of those
never actually say it (the remaining 1% is overwhelmingly synchronization
primitives).

@ -8,20 +8,19 @@ captures this with through the `Send` and `Sync` traits.
* A type is Send if it is safe to send it to another thread. A type is Sync if
* it is safe to share between threads (`&T` is Send).
Send and Sync are *very* fundamental to Rust's concurrency story. As such, a
Send and Sync are fundamental to Rust's concurrency story. As such, a
substantial amount of special tooling exists to make them work right. First and
foremost, they're *unsafe traits*. This means that they are unsafe *to
implement*, and other unsafe code can *trust* that they are correctly
foremost, they're [unsafe traits][]. This means that they are unsafe to
implement, and other unsafe code can that they are correctly
implemented. Since they're *marker traits* (they have no associated items like
methods), correctly implemented simply means that they have the intrinsic
properties an implementor should have. Incorrectly implementing Send or Sync can
cause Undefined Behaviour.
Send and Sync are also what Rust calls *opt-in builtin traits*. This means that,
unlike every other trait, they are *automatically* derived: if a type is
composed entirely of Send or Sync types, then it is Send or Sync. Almost all
primitives are Send and Sync, and as a consequence pretty much all types you'll
ever interact with are Send and Sync.
Send and Sync are also automatically derived traits. This means that, unlike
every other trait, if a type is composed entirely of Send or Sync types, then it
is Send or Sync. Almost all primitives are Send and Sync, and as a consequence
pretty much all types you'll ever interact with are Send and Sync.
Major exceptions include:
@ -37,13 +36,12 @@ sense, one could argue that it would be "fine" for them to be marked as thread
safe.
However it's important that they aren't thread safe to prevent types that
*contain them* from being automatically marked as thread safe. These types have
contain them from being automatically marked as thread safe. These types have
non-trivial untracked ownership, and it's unlikely that their author was
necessarily thinking hard about thread safety. In the case of Rc, we have a nice
example of a type that contains a `*mut` that is *definitely* not thread safe.
example of a type that contains a `*mut` that is definitely not thread safe.
Types that aren't automatically derived can *opt-in* to Send and Sync by simply
implementing them:
Types that aren't automatically derived can simply implement them if desired:
```rust
struct MyBox(*mut u8);
@ -52,12 +50,13 @@ unsafe impl Send for MyBox {}
unsafe impl Sync for MyBox {}
```
In the *incredibly rare* case that a type is *inappropriately* automatically
derived to be Send or Sync, then one can also *unimplement* Send and Sync:
In the *incredibly rare* case that a type is inappropriately automatically
derived to be Send or Sync, then one can also unimplement Send and Sync:
```rust
#![feature(optin_builtin_traits)]
// I have some magic semantics for some synchronization primitive!
struct SpecialThreadToken(u8);
impl !Send for SpecialThreadToken {}
@ -77,3 +76,5 @@ largely behave like an `&` or `&mut` into the collection.
TODO: better explain what can or can't be Send or Sync. Sufficient to appeal
only to data races?
[unsafe traits]: safe-unsafe-meaning.html

@ -1,11 +1,11 @@
% Subtyping and Variance
Although Rust doesn't have any notion of structural inheritance, it *does*
include subtyping. In Rust, subtyping derives entirely from *lifetimes*. Since
include subtyping. In Rust, subtyping derives entirely from lifetimes. Since
lifetimes are scopes, we can partially order them based on the *contains*
(outlives) relationship. We can even express this as a generic bound.
Subtyping on lifetimes in terms of that relationship: if `'a: 'b` ("a contains
Subtyping on lifetimes is in terms of that relationship: if `'a: 'b` ("a contains
b" or "a outlives b"), then `'a` is a subtype of `'b`. This is a large source of
confusion, because it seems intuitively backwards to many: the bigger scope is a
*subtype* of the smaller scope.
@ -72,7 +72,7 @@ to be able to pass `&&'static str` where an `&&'a str` is expected. The
additional level of indirection does not change the desire to be able to pass
longer lived things where shorted lived things are expected.
However this logic *does not* apply to `&mut`. To see why `&mut` should
However this logic doesn't apply to `&mut`. To see why `&mut` should
be invariant over T, consider the following code:
```rust,ignore
@ -109,7 +109,7 @@ between `'a` and T is that `'a` is a property of the reference itself,
while T is something the reference is borrowing. If you change T's type, then
the source still remembers the original type. However if you change the
lifetime's type, no one but the reference knows this information, so it's fine.
Put another way, `&'a mut T` owns `'a`, but only *borrows* T.
Put another way: `&'a mut T` owns `'a`, but only *borrows* T.
`Box` and `Vec` are interesting cases because they're variant, but you can
definitely store values in them! This is where Rust gets really clever: it's
@ -118,7 +118,7 @@ in them *via a mutable reference*! The mutable reference makes the whole type
invariant, and therefore prevents you from smuggling a short-lived type into
them.
Being variant *does* allows `Box` and `Vec` to be weakened when shared
Being variant allows `Box` and `Vec` to be weakened when shared
immutably. So you can pass a `&Box<&'static str>` where a `&Box<&'a str>` is
expected.
@ -126,7 +126,7 @@ However what should happen when passing *by-value* is less obvious. It turns out
that, yes, you can use subtyping when passing by-value. That is, this works:
```rust
fn get_box<'a>(str: &'a u8) -> Box<&'a str> {
fn get_box<'a>(str: &'a str) -> Box<&'a str> {
// string literals are `&'static str`s
Box::new("hello")
}
@ -150,7 +150,7 @@ signature:
fn foo(&'a str) -> usize;
```
This signature claims that it can handle any `&str` that lives *at least* as
This signature claims that it can handle any `&str` that lives at least as
long as `'a`. Now if this signature was variant over `&'a str`, that
would mean
@ -159,10 +159,12 @@ fn foo(&'static str) -> usize;
```
could be provided in its place, as it would be a subtype. However this function
has a *stronger* requirement: it says that it can *only* handle `&'static str`s,
and nothing else. Therefore functions are not variant over their arguments.
has a stronger requirement: it says that it can only handle `&'static str`s,
and nothing else. Giving `&'a str`s to it would be unsound, as it's free to
assume that what it's given lives forever. Therefore functions are not variant
over their arguments.
To see why `Fn(T) -> U` should be *variant* over U, consider the following
To see why `Fn(T) -> U` should be variant over U, consider the following
function signature:
```rust,ignore
@ -177,7 +179,7 @@ therefore completely reasonable to provide
fn foo(usize) -> &'static str;
```
in its place. Therefore functions *are* variant over their return type.
in its place. Therefore functions are variant over their return type.
`*const` has the exact same semantics as `&`, so variance follows. `*mut` on the
other hand can dereference to an `&mut` whether shared or not, so it is marked

@ -31,12 +31,12 @@ panics can only be caught by the parent thread. This means catching a panic
requires spinning up an entire OS thread! This unfortunately stands in conflict
to Rust's philosophy of zero-cost abstractions.
There is an *unstable* API called `catch_panic` that enables catching a panic
There is an unstable API called `catch_panic` that enables catching a panic
without spawning a thread. Still, we would encourage you to only do this
sparingly. In particular, Rust's current unwinding implementation is heavily
optimized for the "doesn't unwind" case. If a program doesn't unwind, there
should be no runtime cost for the program being *ready* to unwind. As a
consequence, *actually* unwinding will be more expensive than in e.g. Java.
consequence, actually unwinding will be more expensive than in e.g. Java.
Don't build your programs to unwind under normal circumstances. Ideally, you
should only panic for programming errors or *extreme* problems.

@ -60,7 +60,7 @@ of memory at once (e.g. half the theoretical address space). As such it's
like the standard library as much as possible, so we'll just kill the whole
program.
We said we don't want to use intrinsics, so doing *exactly* what `std` does is
We said we don't want to use intrinsics, so doing exactly what `std` does is
out. Instead, we'll call `std::process::exit` with some random number.
```rust
@ -84,7 +84,7 @@ But Rust's only supported allocator API is so low level that we'll need to do a
fair bit of extra work. We also need to guard against some special
conditions that can occur with really large allocations or empty allocations.
In particular, `ptr::offset` will cause us *a lot* of trouble, because it has
In particular, `ptr::offset` will cause us a lot of trouble, because it has
the semantics of LLVM's GEP inbounds instruction. If you're fortunate enough to
not have dealt with this instruction, here's the basic story with GEP: alias
analysis, alias analysis, alias analysis. It's super important to an optimizing
@ -102,7 +102,7 @@ As a simple example, consider the following fragment of code:
If the compiler can prove that `x` and `y` point to different locations in
memory, the two operations can in theory be executed in parallel (by e.g.
loading them into different registers and working on them independently).
However in *general* the compiler can't do this because if x and y point to
However the compiler can't do this in general because if x and y point to
the same location in memory, the operations need to be done to the same value,
and they can't just be merged afterwards.
@ -118,7 +118,7 @@ possible.
So that's what GEP's about, how can it cause us trouble?
The first problem is that we index into arrays with unsigned integers, but
GEP (and as a consequence `ptr::offset`) takes a *signed integer*. This means
GEP (and as a consequence `ptr::offset`) takes a signed integer. This means
that half of the seemingly valid indices into an array will overflow GEP and
actually go in the wrong direction! As such we must limit all allocations to
`isize::MAX` elements. This actually means we only need to worry about
@ -138,7 +138,7 @@ However since this is a tutorial, we're not going to be particularly optimal
here, and just unconditionally check, rather than use clever platform-specific
`cfg`s.
The other corner-case we need to worry about is *empty* allocations. There will
The other corner-case we need to worry about is empty allocations. There will
be two kinds of empty allocations we need to worry about: `cap = 0` for all T,
and `cap > 0` for zero-sized types.
@ -165,9 +165,9 @@ protected from being allocated anyway (a whole 4k, on many platforms).
However what about for positive-sized types? That one's a bit trickier. In
principle, you can argue that offsetting by 0 gives LLVM no information: either
there's an element before the address, or after it, but it can't know which.
there's an element before the address or after it, but it can't know which.
However we've chosen to conservatively assume that it may do bad things. As
such we *will* guard against this case explicitly.
such we will guard against this case explicitly.
*Phew*

@ -130,7 +130,7 @@ impl<'a, T> Drop for Drain<'a, T> {
impl<T> Vec<T> {
pub fn drain(&mut self) -> Drain<T> {
// this is a mem::forget safety thing. If Drain is forgotten, we just
// leak the whole Vec's contents. Also we need to do this *eventually*
// leak the whole Vec's contents. Also we need to do this eventually
// anyway, so why not do it now?
self.len = 0;

@ -10,7 +10,7 @@ handling the case where the source and destination overlap (which will
definitely happen here).
If we insert at index `i`, we want to shift the `[i .. len]` to `[i+1 .. len+1]`
using the *old* len.
using the old len.
```rust,ignore
pub fn insert(&mut self, index: usize, elem: T) {

@ -21,8 +21,8 @@ read out the value pointed to at that end and move the pointer over by one. When
the two pointers are equal, we know we're done.
Note that the order of read and offset are reversed for `next` and `next_back`
For `next_back` the pointer is always *after* the element it wants to read next,
while for `next` the pointer is always *at* the element it wants to read next.
For `next_back` the pointer is always after the element it wants to read next,
while for `next` the pointer is always at the element it wants to read next.
To see why this is, consider the case where every element but one has been
yielded.
@ -124,7 +124,7 @@ impl<T> DoubleEndedIterator for IntoIter<T> {
```
Because IntoIter takes ownership of its allocation, it needs to implement Drop
to free it. However it *also* wants to implement Drop to drop any elements it
to free it. However it also wants to implement Drop to drop any elements it
contains that weren't yielded.

@ -32,14 +32,14 @@ pub fn push(&mut self, elem: T) {
Easy! How about `pop`? Although this time the index we want to access is
initialized, Rust won't just let us dereference the location of memory to move
the value out, because that *would* leave the memory uninitialized! For this we
the value out, because that would leave the memory uninitialized! For this we
need `ptr::read`, which just copies out the bits from the target address and
intrprets it as a value of type T. This will leave the memory at this address
*logically* uninitialized, even though there is in fact a perfectly good instance
logically uninitialized, even though there is in fact a perfectly good instance
of T there.
For `pop`, if the old len is 1, we want to read out of the 0th index. So we
should offset by the *new* len.
should offset by the new len.
```rust,ignore
pub fn pop(&mut self) -> Option<T> {

@ -2,7 +2,7 @@
It's time. We're going to fight the spectre that is zero-sized types. Safe Rust
*never* needs to care about this, but Vec is very intensive on raw pointers and
raw allocations, which are exactly the *only* two things that care about
raw allocations, which are exactly the two things that care about
zero-sized types. We need to be careful of two things:
* The raw allocator API has undefined behaviour if you pass in 0 for an
@ -22,7 +22,7 @@ So if the allocator API doesn't support zero-sized allocations, what on earth
do we store as our allocation? Why, `heap::EMPTY` of course! Almost every operation
with a ZST is a no-op since ZSTs have exactly one value, and therefore no state needs
to be considered to store or load them. This actually extends to `ptr::read` and
`ptr::write`: they won't actually look at the pointer at all. As such we *never* need
`ptr::write`: they won't actually look at the pointer at all. As such we never need
to change the pointer.
Note however that our previous reliance on running out of memory before overflow is

Loading…
Cancel
Save