Merge pull request #106 from Gankro/2018-cleanups-1

cleanups for Rust 2018
rust-1.31.0-2
Steve Klabnik 6 years ago committed by GitHub
commit b1f40cf12d
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -2,20 +2,20 @@
#### The Dark Arts of Advanced and Unsafe Rust Programming #### The Dark Arts of Advanced and Unsafe Rust Programming
# NOTE: This is a draft document that discusses several unstable aspects of Rust, and may contain serious errors or outdated information.
> Instead of the programs I had hoped for, there came only a shuddering blackness > Instead of the programs I had hoped for, there came only a shuddering blackness
and ineffable loneliness; and I saw at last a fearful truth which no one had and ineffable loneliness; and I saw at last a fearful truth which no one had
ever dared to breathe before — the unwhisperable secret of secrets — The fact ever dared to breathe before — the unwhisperable secret of secrets — The fact
that this language of stone and stridor is not a sentient perpetuation of Rust that this language of stone and stridor is not a sentient perpetuation of Rust
as London is of Old London and Paris of Old Paris, but that it is in fact as London is of Old London and Paris of Old Paris, but that it is in fact
quite unsafe, its sprawling body imperfectly embalmed and infested with queer quite `unsafe`, its sprawling body imperfectly embalmed and infested with queer
animate things which have nothing to do with it as it was in compilation. animate things which have nothing to do with it as it was in compilation.
This book digs into all the awful details that are necessary to understand in This book digs into all the awful details that you need to understand when
order to write correct Unsafe Rust programs. Due to the nature of this problem, writing Unsafe Rust programs.
it may lead to unleashing untold horrors that shatter your psyche into a billion
infinitesimal fragments of despair. > THE KNOWLEDGE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF UNLEASHING INDESCRIBABLE HORRORS THAT
SHATTER YOUR PSYCHE AND SET YOUR MIND ADRIFT IN THE UNKNOWABLY INFINITE COSMOS.
Should you wish a long and happy career of writing Rust programs, you should Should you wish a long and happy career of writing Rust programs, you should
turn back now and forget you ever saw this book. It is not necessary. However turn back now and forget you ever saw this book. It is not necessary. However

@ -3,3 +3,17 @@
Low-level programming cares a lot about data layout. It's a big deal. It also Low-level programming cares a lot about data layout. It's a big deal. It also
pervasively influences the rest of the language, so we're going to start by pervasively influences the rest of the language, so we're going to start by
digging into how data is represented in Rust. digging into how data is represented in Rust.
This chapter is ideally in agreement with, and rendered redundant by,
the [Type Layout section of the Reference][ref-type-layout]. When this
book was first written, the reference was in complete disrepair, and the
Rustonomicon was attempting to serve as a partial replacement for the reference.
This is no longer the case, so this whole chapter can ideally be deleted.
We'll keep this chapter around for a bit longer, but ideally you should be
contributing any new facts or improvements to the Reference instead.
ref-type-layout: ../reference/type-layout.html

@ -1,7 +1,7 @@
# Exotically Sized Types # Exotically Sized Types
Most of the time, we think in terms of types with a fixed, positive size. This Most of the time, we expect types to have a statically known and positive size.
is not always the case, however. This isn't always the case in Rust.
@ -9,47 +9,80 @@ is not always the case, however.
# Dynamically Sized Types (DSTs) # Dynamically Sized Types (DSTs)
Rust in fact supports Dynamically Sized Types (DSTs): types without a statically Rust supports Dynamically Sized Types (DSTs): types without a statically
known size or alignment. On the surface, this is a bit nonsensical: Rust *must* known size or alignment. On the surface, this is a bit nonsensical: Rust *must*
know the size and alignment of something in order to correctly work with it! In know the size and alignment of something in order to correctly work with it! In
this regard, DSTs are not normal types. Due to their lack of a statically known this regard, DSTs are not normal types. Because they lack a statically known
size, these types can only exist behind some kind of pointer. Any pointer to a size, these types can only exist behind a pointer. Any pointer to a
DST consequently becomes a *fat* pointer consisting of the pointer and the DST consequently becomes a *wide* pointer consisting of the pointer and the
information that "completes" them (more on this below). information that "completes" them (more on this below).
There are two major DSTs exposed by the language: trait objects, and slices. There are two major DSTs exposed by the language:
* trait objects: `dyn MyTrait`
* slices: `[T]`, `str`, and others
A trait object represents some type that implements the traits it specifies. A trait object represents some type that implements the traits it specifies.
The exact original type is *erased* in favor of runtime reflection The exact original type is *erased* in favor of runtime reflection
with a vtable containing all the information necessary to use the type. with a vtable containing all the information necessary to use the type.
This is the information that completes a trait object: a pointer to its vtable. The information that completes a trait object pointer is the vtable pointer.
The runtime size of the pointee can be dynamically requested from the vtable.
A slice is simply a view into some contiguous storage -- typically an array or A slice is simply a view into some contiguous storage -- typically an array or
`Vec`. The information that completes a slice is just the number of elements `Vec`. The information that completes a slice pointer is just the number of elements
it points to. it points to. The runtime size of the pointee is just the statically known size
of an element multiplied by the number of elements.
Structs can actually store a single DST directly as their last field, but this Structs can actually store a single DST directly as their last field, but this
makes them a DST as well: makes them a DST as well:
```rust ```rust
// Can't be stored on the stack directly // Can't be stored on the stack directly
struct Foo { struct MySuperSlice {
info: u32, info: u32,
data: [u8], data: [u8],
} }
``` ```
Although such a type is largely useless without a way to construct it. Currently the
only properly supported way to create a custom DST is by making your type generic
and performing an *unsizing coercion*:
```rust
struct MySuperSliceable<T: ?Sized> {
info: u32,
data: T
}
fn main() {
let sized: MySuperSliceable<[u8; 8]> = MySuperSliceable {
info: 17,
data: [0; 8],
};
let dynamic: &MySuperSliceable<[u8]> = &sized;
// prints: "17 [0, 0, 0, 0, 0, 0, 0, 0]"
println!("{} {:?}", dynamic.info, &dynamic.data);
}
```
(Yes, custom DSTs are a largely half-baked feature for now.)
# Zero Sized Types (ZSTs) # Zero Sized Types (ZSTs)
Rust actually allows types to be specified that occupy no space: Rust also allows types to be specified that occupy no space:
```rust ```rust
struct Foo; // No fields = no size struct Nothing; // No fields = no size
// All fields have no size = no size // All fields have no size = no size
struct Baz { struct LotsOfNothing {
foo: Foo, foo: Nothing,
qux: (), // empty tuple has no size qux: (), // empty tuple has no size
baz: [u8; 0], // empty array has no size baz: [u8; 0], // empty array has no size
} }
@ -63,7 +96,7 @@ make sense -- it doesn't occupy any space. Also there's only one value of that
type, so anything that loads it can just produce it from the aether -- which is type, so anything that loads it can just produce it from the aether -- which is
also a no-op since it doesn't occupy any space. also a no-op since it doesn't occupy any space.
One of the most extreme example's of this is Sets and Maps. Given a One of the most extreme examples of this is Sets and Maps. Given a
`Map<Key, Value>`, it is common to implement a `Set<Key>` as just a thin wrapper `Map<Key, Value>`, it is common to implement a `Set<Key>` as just a thin wrapper
around `Map<Key, UselessJunk>`. In many languages, this would necessitate around `Map<Key, UselessJunk>`. In many languages, this would necessitate
allocating space for UselessJunk and doing work to store and load UselessJunk allocating space for UselessJunk and doing work to store and load UselessJunk
@ -78,9 +111,8 @@ support values.
Safe code need not worry about ZSTs, but *unsafe* code must be careful about the Safe code need not worry about ZSTs, but *unsafe* code must be careful about the
consequence of types with no size. In particular, pointer offsets are no-ops, consequence of types with no size. In particular, pointer offsets are no-ops,
and standard allocators (including jemalloc, the one used by default in Rust) and standard allocators may return `null` when a zero-sized allocation is
may return `nullptr` when a zero-sized allocation is requested, which is requested, which is indistinguishable from the out of memory result.
indistinguishable from out of memory.
@ -97,7 +129,7 @@ enum Void {} // No variants = EMPTY
``` ```
Empty types are even more marginal than ZSTs. The primary motivating example for Empty types are even more marginal than ZSTs. The primary motivating example for
Void types is type-level unreachability. For instance, suppose an API needs to an empty type is type-level unreachability. For instance, suppose an API needs to
return a Result in general, but a specific case actually is infallible. It's return a Result in general, but a specific case actually is infallible. It's
actually possible to communicate this at the type level by returning a actually possible to communicate this at the type level by returning a
`Result<T, Void>`. Consumers of the API can confidently unwrap such a Result `Result<T, Void>`. Consumers of the API can confidently unwrap such a Result
@ -125,9 +157,35 @@ But this trick doesn't work yet.
One final subtle detail about empty types is that raw pointers to them are One final subtle detail about empty types is that raw pointers to them are
actually valid to construct, but dereferencing them is Undefined Behavior actually valid to construct, but dereferencing them is Undefined Behavior
because that doesn't actually make sense. That is, you could model C's `void *` because that wouldn't make sense.
type with `*const Void`, but this doesn't necessarily gain anything over using
e.g. `*const ()`, which *is* safe to randomly dereference. We recommend against modelling C's `void*` type with `*const Void`.
A lot of people started doing that but quickly ran into trouble because
Rust doesn't really have any safety guards against trying to instantiate
empty types with unsafe code, and if you do it, it's Undefined Behaviour.
This was especially problematic because developers had a habit of converting
raw pointers to references and `&Void` is *also* Undefined Behaviour to
construct.
`*const ()` (or equivalent) works reasonably well for `void*`, and can be made
into a reference without any safety problems. It still doesn't prevent you from
trying to read or write values, but at least it compiles to a no-op instead
of UB.
# Extern Types
There is [an accepted RFC][extern-types] to add proper types with an unknown size,
called *extern types*, which would let Rust developers model things like C's `void*`
and other "declared but never defined" types more accurately. However as of
Rust 2018, the feature is stuck in limbo over how `size_of::<MyExternType>()`
should behave.
[dst-issue]: https://github.com/rust-lang/rust/issues/26403 [dst-issue]: https://github.com/rust-lang/rust/issues/26403
[extern-types]: https://github.com/rust-lang/rfcs/blob/master/text/1861-extern-types.md

@ -15,18 +15,26 @@ or C++. Any type you expect to pass through an FFI boundary should have
necessary to soundly do more elaborate tricks with data layout such as necessary to soundly do more elaborate tricks with data layout such as
reinterpreting values as a different type. reinterpreting values as a different type.
However, the interaction with Rust's more exotic data layout features must be We strongly recommend using [rust-bindgen][] and/or [cbdingen][] to manage your FFI
boundaries for you. The Rust team works closely with those projects to ensure
that they work robustly and are compatible with current and future guarantees
about type layouts and reprs.
The interaction of `repr(C)` with Rust's more exotic data layout features must be
kept in mind. Due to its dual purpose as "for FFI" and "for layout control", kept in mind. Due to its dual purpose as "for FFI" and "for layout control",
`repr(C)` can be applied to types that will be nonsensical or problematic if `repr(C)` can be applied to types that will be nonsensical or problematic if
passed through the FFI boundary. passed through the FFI boundary.
* ZSTs are still zero-sized, even though this is not a standard behavior in * ZSTs are still zero-sized, even though this is not a standard behavior in
C, and is explicitly contrary to the behavior of an empty type in C++, which C, and is explicitly contrary to the behavior of an empty type in C++, which
still consumes a byte of space. says they should still consume a byte of space.
* DST pointers (fat pointers), tuples, and enums with fields are not a concept * DST pointers (wide pointers) and tuples are not a concept
in C, and as such are never FFI-safe. in C, and as such are never FFI-safe.
* Enums with fields also aren't a concept in C or C++, but a valid bridging
of the types [is defined][really-tagged].
* If `T` is an [FFI-safe non-nullable pointer * If `T` is an [FFI-safe non-nullable pointer
type](ffi.html#the-nullable-pointer-optimization), type](ffi.html#the-nullable-pointer-optimization),
`Option<T>` is guaranteed to have the same layout and ABI as `T` and is `Option<T>` is guaranteed to have the same layout and ABI as `T` and is
@ -36,13 +44,13 @@ still consumes a byte of space.
* Tuple structs are like structs with regards to `repr(C)`, as the only * Tuple structs are like structs with regards to `repr(C)`, as the only
difference from a struct is that the fields arent named. difference from a struct is that the fields arent named.
* This is equivalent to one of `repr(u*)` (see the next section) for enums. The * `repr(C)` is equivalent to one of `repr(u*)` (see the next section) for
chosen size is the default enum size for the target platform's C application fieldless enums. The chosen size is the default enum size for the target platform's C
binary interface (ABI). Note that enum representation in C is implementation application binary interface (ABI). Note that enum representation in C is implementation
defined, so this is really a "best guess". In particular, this may be incorrect defined, so this is really a "best guess". In particular, this may be incorrect
when the C code of interest is compiled with certain flags. when the C code of interest is compiled with certain flags.
* Field-less enums with `repr(C)` or `repr(u*)` still may not be set to an * Fieldless enums with `repr(C)` or `repr(u*)` still may not be set to an
integer value without a corresponding variant, even though this is integer value without a corresponding variant, even though this is
permitted behavior in C or C++. It is undefined behavior to (unsafely) permitted behavior in C or C++. It is undefined behavior to (unsafely)
construct an instance of an enum that does not match one of its construct an instance of an enum that does not match one of its
@ -58,12 +66,12 @@ be additional zero-sized fields). The effect is that the layout and ABI of the
whole struct is guaranteed to be the same as that one field. whole struct is guaranteed to be the same as that one field.
The goal is to make it possible to transmute between the single field and the The goal is to make it possible to transmute between the single field and the
struct. An example of that is the [`UnsafeCell`], which can be transmuted into struct. An example of that is [`UnsafeCell`], which can be transmuted into
the type it wraps. the type it wraps.
Also, passing the struct through FFI where the inner field type is expected on Also, passing the struct through FFI where the inner field type is expected on
the other side is allowed. In particular, this is necessary for `struct the other side is guaranteed to work. In particular, this is necessary for `struct
Foo(f32)` to have the same ABI as `f32`. Foo(f32)` to always have the same ABI as `f32`.
More details are in the [RFC][rfc-transparent]. More details are in the [RFC][rfc-transparent].
@ -71,20 +79,22 @@ More details are in the [RFC][rfc-transparent].
# repr(u*), repr(i*) # repr(u*), repr(i*)
These specify the size to make a field-less enum. If the discriminant overflows These specify the size to make a fieldless enum. If the discriminant overflows
the integer it has to fit in, it will produce a compile-time error. You can the integer it has to fit in, it will produce a compile-time error. You can
manually ask Rust to allow this by setting the overflowing element to explicitly manually ask Rust to allow this by setting the overflowing element to explicitly
be 0. However Rust will not allow you to create an enum where two variants have be 0. However Rust will not allow you to create an enum where two variants have
the same discriminant. the same discriminant.
The term "field-less enum" only means that the enum doesn't have data in any The term "fieldless enum" only means that the enum doesn't have data in any
of its variants. A field-less enum without a `repr(u*)` or `repr(C)` is of its variants. A fieldless enum without a `repr(u*)` or `repr(C)` is
still a Rust native type, and does not have a stable ABI representation. still a Rust native type, and does not have a stable ABI representation.
Adding a `repr` causes it to be treated exactly like the specified Adding a `repr` causes it to be treated exactly like the specified
integer size for ABI purposes. integer size for ABI purposes.
Any enum with fields is a Rust type with no guaranteed ABI (even if the If the enum has fields, the effect is similar to the effect of `repr(C)`
only data is `PhantomData` or something else with zero size). in that there is a defined layout of the type. This makes it possible to
pass the enum to C code, or access the type's raw representation and directly
manipulate its tag and fields. See [the RFC][really-tagged] for details.
Adding an explicit `repr` to an enum suppresses the null-pointer Adding an explicit `repr` to an enum suppresses the null-pointer
optimization. optimization.
@ -107,13 +117,16 @@ compiler might be able to paper over alignment issues with shifts and masks.
However if you take a reference to a packed field, it's unlikely that the However if you take a reference to a packed field, it's unlikely that the
compiler will be able to emit code to avoid an unaligned load. compiler will be able to emit code to avoid an unaligned load.
**[As of Rust 1.30.0 this still can cause undefined behavior.][ub loads]** **[As of Rust 2018, this still can cause undefined behavior.][ub loads]**
`repr(packed)` is not to be used lightly. Unless you have extreme requirements, `repr(packed)` is not to be used lightly. Unless you have extreme requirements,
this should not be used. this should not be used.
This repr is a modifier on `repr(C)` and `repr(rust)`. This repr is a modifier on `repr(C)` and `repr(rust)`.
# repr(align(n)) # repr(align(n))
`repr(align(n))` (where `n` is a power of two) forces the type to have an `repr(align(n))` (where `n` is a power of two) forces the type to have an
@ -126,8 +139,15 @@ kinds of concurrent code).
This is a modifier on `repr(C)` and `repr(rust)`. It is incompatible with This is a modifier on `repr(C)` and `repr(rust)`. It is incompatible with
`repr(packed)`. `repr(packed)`.
[reference]: https://github.com/rust-rfcs/unsafe-code-guidelines/tree/master/reference/src/representation [reference]: https://github.com/rust-rfcs/unsafe-code-guidelines/tree/master/reference/src/representation
[drop flags]: drop-flags.html [drop flags]: drop-flags.html
[ub loads]: https://github.com/rust-lang/rust/issues/27060 [ub loads]: https://github.com/rust-lang/rust/issues/27060
[`UnsafeCell`]: ../std/cell/struct.UnsafeCell.html [`UnsafeCell`]: ../std/cell/struct.UnsafeCell.html
[rfc-transparent]: https://github.com/rust-lang/rfcs/blob/master/text/1758-repr-transparent.md [rfc-transparent]: https://github.com/rust-lang/rfcs/blob/master/text/1758-repr-transparent.md
[really-tagged]: https://github.com/rust-lang/rfcs/blob/master/text/2195-really-tagged-unions.md
[rust-bindgen]: https://rust-lang-nursery.github.io/rust-bindgen/
[cbindgen]: https://github.com/eqrion/cbindgen

@ -2,12 +2,14 @@
First and foremost, all types have an alignment specified in bytes. The First and foremost, all types have an alignment specified in bytes. The
alignment of a type specifies what addresses are valid to store the value at. A alignment of a type specifies what addresses are valid to store the value at. A
value of alignment `n` must only be stored at an address that is a multiple of value with alignment `n` must only be stored at an address that is a multiple of
`n`. So alignment 2 means you must be stored at an even address, and 1 means `n`. So alignment 2 means you must be stored at an even address, and 1 means
that you can be stored anywhere. Alignment is at least 1, and always a power that you can be stored anywhere. Alignment is at least 1, and always a power
of 2. Most primitives are generally aligned to their size, although this is of 2.
platform-specific behavior. In particular, on x86 `u64` and `f64` may be only
aligned to 32 bits. Primitives are usually aligned to their size, although this is
platform-specific behavior. For example, on x86 `u64` and `f64` are often
aligned to 4 bytes (32 bits).
A type's size must always be a multiple of its alignment. This ensures that an A type's size must always be a multiple of its alignment. This ensures that an
array of that type may always be indexed by offsetting by a multiple of its array of that type may always be indexed by offsetting by a multiple of its
@ -20,12 +22,12 @@ Rust gives you the following ways to lay out composite data:
* tuples (anonymous product types) * tuples (anonymous product types)
* arrays (homogeneous product types) * arrays (homogeneous product types)
* enums (named sum types -- tagged unions) * enums (named sum types -- tagged unions)
* unions (untagged) * unions (untagged unions)
An enum is said to be *field-less* if none of its variants have associated data. An enum is said to be *field-less* if none of its variants have associated data.
Composite structures will have an alignment equal to the maximum By default, composite structures have an alignment equal to the maximum
of their fields' alignment. Rust will consequently insert padding where of their fields' alignments. Rust will consequently insert padding where
necessary to ensure that all fields are properly aligned and that the overall necessary to ensure that all fields are properly aligned and that the overall
type's size is a multiple of its alignment. For instance: type's size is a multiple of its alignment. For instance:
@ -37,7 +39,7 @@ struct A {
} }
``` ```
will be 32-bit aligned on an architecture that aligns these primitives to their will be 32-bit aligned on a target that aligns these primitives to their
respective sizes. The whole struct will therefore have a size that is a multiple respective sizes. The whole struct will therefore have a size that is a multiple
of 32-bits. It may become: of 32-bits. It may become:
@ -64,8 +66,8 @@ struct A {
There is *no indirection* for these types; all data is stored within the struct, There is *no indirection* for these types; all data is stored within the struct,
as you would expect in C. However with the exception of arrays (which are as you would expect in C. However with the exception of arrays (which are
densely packed and in-order), the layout of data is not by default specified in densely packed and in-order), the layout of data is not specified by default.
Rust. Given the two following struct definitions: Given the two following struct definitions:
```rust ```rust
struct A { struct A {
@ -81,8 +83,7 @@ struct B {
Rust *does* guarantee that two instances of A have their data laid out in Rust *does* guarantee that two instances of A have their data laid out in
exactly the same way. However Rust *does not* currently guarantee that an exactly the same way. However Rust *does not* currently guarantee that an
instance of A has the same field ordering or padding as an instance of B, though instance of A has the same field ordering or padding as an instance of B.
in practice there's no reason why they wouldn't.
With A and B as written, this point would seem to be pedantic, but several other With A and B as written, this point would seem to be pedantic, but several other
features of Rust make it desirable for the language to play with data layout in features of Rust make it desirable for the language to play with data layout in
@ -119,7 +120,7 @@ struct Foo<u32, u16> {
} }
``` ```
The latter case quite simply wastes space. An optimal use of space therefore The latter case quite simply wastes space. An optimal use of space
requires different monomorphizations to have *different field orderings*. requires different monomorphizations to have *different field orderings*.
Enums make this consideration even more complicated. Naively, an enum such as: Enums make this consideration even more complicated. Naively, an enum such as:
@ -132,7 +133,7 @@ enum Foo {
} }
``` ```
would be laid out as: might be laid out as:
```rust ```rust
struct FooRepr { struct FooRepr {
@ -141,23 +142,22 @@ struct FooRepr {
} }
``` ```
And indeed this is approximately how it would be laid out in general (modulo the And indeed this is approximately how it would be laid out (modulo the
size and position of `tag`). size and position of `tag`).
However there are several cases where such a representation is inefficient. The However there are several cases where such a representation is inefficient. The
classic case of this is Rust's "null pointer optimization": an enum consisting classic case of this is Rust's "null pointer optimization": an enum consisting
of a single outer unit variant (e.g. `None`) and a (potentially nested) non- of a single outer unit variant (e.g. `None`) and a (potentially nested) non-
nullable pointer variant (e.g. `&T`) makes the tag unnecessary, because a null nullable pointer variant (e.g. `Some(&T)`) makes the tag unnecessary. A null
pointer value can safely be interpreted to mean that the unit variant is chosen pointer can safely be interpreted as the unit (`None`) variant. The net
instead. The net result is that, for example, `size_of::<Option<&T>>() == result is that, for example, `size_of::<Option<&T>>() == size_of::<&T>()`.
size_of::<&T>()`.
There are many types in Rust that are, or contain, non-nullable pointers such as There are many types in Rust that are, or contain, non-nullable pointers such as
`Box<T>`, `Vec<T>`, `String`, `&T`, and `&mut T`. Similarly, one can imagine `Box<T>`, `Vec<T>`, `String`, `&T`, and `&mut T`. Similarly, one can imagine
nested enums pooling their tags into a single discriminant, as they are by nested enums pooling their tags into a single discriminant, as they are by
definition known to have a limited range of valid values. In principle enums could definition known to have a limited range of valid values. In principle enums could
use fairly elaborate algorithms to cache bits throughout nested types with use fairly elaborate algorithms to store bits throughout nested types with
special constrained representations. As such it is *especially* desirable that forbidden values. As such it is *especially* desirable that
we leave enum layout unspecified today. we leave enum layout unspecified today.
[dst]: exotic-sizes.html#dynamically-sized-types-dsts [dst]: exotic-sizes.html#dynamically-sized-types-dsts

Loading…
Cancel
Save