diff --git a/conversions.md b/conversions.md index b4bde01..2fd750b 100644 --- a/conversions.md +++ b/conversions.md @@ -1,7 +1,77 @@ % Type Conversions +At the end of the day, everything is just a pile of bits somewhere, and type systems +are just there to help us use those bits right. Needing to reinterpret those piles +of bits as different types is a common problem and Rust consequently gives you +several ways to do that. + # Safe Rust +First we'll look at the ways that *Safe Rust* gives you to reinterpret values. The +most trivial way to do this is to just destructure a value into its constituent +parts and then build a new type out of them. e.g. + +```rust +struct Foo { + x: u32, + y: u16, +} + +struct Bar { + a: u32, + b: u16, +} + +fn reinterpret(foo: Foo) -> Bar { + let Foo { x, y } = foo; + Bar { a: x, b: y } +} +``` + +But this is, at best, annoying to do. For common conversions, rust provides +more ergonomic alternatives. + +## Auto-Deref + +Deref is a trait that allows you to overload the unary `*` to specify a type +you dereference to. This is largely only intended to be implemented by pointer +types like `&`, `Box`, and `Rc`. The dot operator will automatically perform +automatic dereferencing, so that foo.bar() will work uniformly on `Foo`, `&Foo`, `&&Foo`, +`&Rc>>` and so-on. Search bottoms out on the *first* match, +so implementing methods on pointers is generally to be avoided, as it will shadow +"actual" methods. + +## Coercions + +Types can implicitly be coerced to change in certain contexts. These changes are generally +just *weakening* of types, largely focused around pointers. They mostly exist to make +Rust "just work" in more cases. For instance +`&mut T` coerces to `&T`, and `&T` coerces to `*const T`. The most useful coercion you will +actually think about it is probably the general *Deref Coercion*: `&T` coerces to `&U` when +`T: Deref`. This enables us to pass an `&String` where an `&str` is expected, for instance. + +## Casts + +Casts are a superset of coercions: every coercion can be explicitly invoked via a cast, +but some changes require a cast. These "true casts" are generally regarded as dangerous or +problematic actions. The set of true casts is actually quite small, and once again revolves +largely around pointers. However it also introduces the primary mechanism to convert between +numeric types. + +* rawptr -> rawptr (e.g. `*mut T as *const T` or `*mut T as *mut U`) +* rawptr <-> usize (e.g. `*mut T as usize` or `usize as *mut T`) +* primitive -> primitive (e.g. `u32 as u8` or `u8 as u32`) +* c-like enum -> integer/bool (e.g. `DaysOfWeek as u8`) +* `u8` -> `char` + + +## Conversion Traits + +For full formal specification of all the kinds of coercions and coercion sites, see: +https://github.com/rust-lang/rfcs/blob/master/text/0401-coercions.md + + + * Coercions * Casts * Conversion Traits (Into/As/...) diff --git a/data.md b/data.md new file mode 100644 index 0000000..6fa6356 --- /dev/null +++ b/data.md @@ -0,0 +1,254 @@ +% Data Representation in Rust + +Low-level programming cares a lot about data layout. It's a big deal. It also pervasively +influences the rest of the language, so we're going to start by digging into how data is +represented in Rust. + +# The `rust` repr + +Rust gives you the following ways to lay out composite data: + +* structs (named product types) +* tuples (anonymous product types) +* arrays (homogeneous product types) +* enums (named sum types -- tagged unions) + +For all these, individual fields are aligned to their preferred alignment. +For primitives this is equal to +their size. For instance, a u32 will be aligned to a multiple of 32 bits, and a u16 will +be aligned to a multiple of 16 bits. Composite structures will have their size rounded +up to be a multiple of the highest alignment required by their fields, and an alignment +requirement equal to the highest alignment required by their fields. So for instance, + +```rust +struct A { + a: u8, + c: u64, + b: u32, +} +``` + +will have a size that is a multiple of 64-bits, and 64-bit alignment. + +There is *no indirection* for these types; all data is stored contiguously as you would +expect in C. However with the exception of arrays, the layout of data is not by +default specified in Rust. Given the two following struct definitions: + +```rust +struct A { + a: i32, + b: u64, +} + +struct B { + x: i32, + b: u64, +} +``` + +Rust *does* guarantee that two instances of A have their data laid out in exactly +the same way. However Rust *does not* guarantee that an instance of A has the same +field ordering or padding as an instance of B (in practice there's no *particular* +reason why they wouldn't, other than that its not currently guaranteed). + +With A and B as written, this is basically nonsensical, but several other features +of Rust make it desirable for the language to play with data layout in complex ways. + +For instance, consider this struct: + +```rust +struct Foo { + count: u16, + data1: T, + data2: U, +} +``` + +Now consider the monomorphizations of `Foo` and `Foo`. If Rust lays out the +fields in the order specified, we expect it to *pad* the values in the struct to satisfy +their *alignment* requirements. So if Rust didn't reorder fields, we would expect Rust to +produce the following: + +```rust +struct Foo { + count: u16, + data1: u16, + data2: u32, +} + +struct Foo { + count: u16, + _pad1: u16, + data1: u32, + data2: u16, + _pad2: u16, +} +``` + +The former case quite simply wastes space. An optimal use of space therefore requires +different monomorphizations to *have different field orderings*. + +**Note: this is a hypothetical optimization that is not yet implemented in Rust 1.0.0** + +Enums make this consideration even more complicated. Naively, an enum such as: + +```rust +enum Foo { + A(u32), + B(u64), + C(u8), +} +``` + +would be laid out as: + +```rust +struct FooRepr { + data: u64, // this is *really* either a u64, u32, or u8 based on `tag` + tag: u8, // 0 = A, 1 = B, 2 = C +} +``` + +And indeed this is approximately how it would be laid out in general +(modulo the size and position of `tag`). However there are several cases where +such a representation is ineffiecient. The classic case of this is Rust's +"null pointer optimization". Given a pointer that is known to not be null +(e.g. `&u32`), an enum can *store* a discriminant bit *inside* the pointer +by using null as a special value. The net result is that +`sizeof(Option<&T>) == sizeof<&T>` + +There are many types in Rust that are, or contain, "not null" pointers such as `Box`, `Vec`, +`String`, `&T`, and `&mut T`. Similarly, one can imagine nested enums pooling their tags into +a single descriminant, as they are by definition known to have a limited range of valid values. +In principle enums can use fairly elaborate algorithms to cache bits throughout nested types +with special constrained representations. As such it is *especially* desirable that we leave +enum layout unspecified today. + +# Dynamically Sized Types (DSTs) + +Rust also supports types without a statically known size. On the surface, +this is a bit nonsensical: Rust must know the size of something in order to +work with it. DSTs are generally produced as views, or through type-erasure +of types that *do* have a known size. Due to their lack of a statically known +size, these types can only exist *behind* some kind of pointer. They consequently +produce a *fat* pointer consisting of the pointer and the information that +*completes* them. + +For instance, the slice type, `[T]`, is some statically unknown number of elements +stored contiguously. `&[T]` consequently consists of a `(&T, usize)` pair that specifies +where the slice starts, and how many elements it contains. Similarly Trait Objects +support interface-oriented type erasure through a `(data_ptr, vtable_ptr)` pair. + +Structs can actually store a single DST directly as their last field, but this +makes them a DST as well: + +```rust +// Can't be stored on the stack directly +struct Foo { + info: u32, + data: [u8], +} +``` + +# Zero Sized Types (ZSTs) + +Rust actually allows types to be specified that occupy *no* space: + +```rust +struct Foo; // No fields = no size +enum Bar; // No variants = no size + +// All fields have no size = no size +struct Baz { + foo: Foo, + bar: Bar, + qux: (), // empty tuple has no size +} +``` + +On their own, ZSTs are, for obvious reasons, pretty useless. However +as with many curious layout choices in Rust, their potential is realized in a generic +context. + +Rust largely understands that any operation that produces or stores a ZST +can be reduced to a no-op. For instance, a `HashSet` can be effeciently implemented +as a thin wrapper around `HashMap` because all the operations `HashMap` normally +does to store and retrieve keys will be completely stripped in monomorphization. + +Similarly `Result<(), ()>` and `Option<()>` are effectively just fancy `bool`s. + +Safe code need not worry about ZSTs, but *unsafe* code must be careful about the +consequence of types with no size. In particular, pointer offsets are no-ops, and +standard allocators (including jemalloc, the one used by Rust) generally consider +passing in `0` as Undefined Behaviour. + +# Drop Flags + +For unfortunate legacy implementation reasons, Rust as of 1.0.0 will do a nasty trick to +any type that implements the `Drop` trait (has a destructor): it will insert a secret field +in the type. That is, + +```rust +struct Foo { + a: u32, + b: u32, +} + +impl Drop for Foo { + fn drop(&mut self) { } +} +``` + +will cause Foo to secretly become: + +```rust +struct Foo { + a: u32, + b: u32, + _drop_flag: u8, +} +``` + +For details as to *why* this is done, and how to make it not happen, check out +[SOME OTHER SECTION]. + +# Alternative representations + +Rust allows you to specify alternative data layout strategies from the default Rust +one. + +# repr(C) + +This is the most important `repr`. It has fairly simple intent: do what C does. +The order, size, and alignment of fields is exactly what you would expect from +C or C++. Any type you expect to pass through an FFI boundary should have `repr(C)`, +as C is the lingua-franca of the programming world. However this is also necessary +to soundly do more elaborate tricks with data layout such as reintepretting values +as a different type. + +However, the interaction with Rust's more exotic data layout features must be kept +in mind. Due to its dual purpose as a "for FFI" and "for layout control", repr(C) +can be applied to types that will be nonsensical or problematic if passed through +the FFI boundary. + +* ZSTs are still zero-sized, even though this is not a standard behaviour +in C, and is explicitly contrary to the behaviour of an empty type in C++, which +still consumes a byte of space. + +* DSTs are not a concept in C + +* **The drop flag will still be added** + +* This is equivalent to repr(u32) for enums (see below) + +# repr(packed) + +`repr(packed)` forces rust to strip any padding it would normally apply. +This may improve the memory footprint of a type, but will have negative +side-effects from "field access is heavily penalized" to "completely breaks +everything" based on target platform. + +# repr(u8), repr(u16), repr(u32), repr(u64) + +These specify the size to make a c-like enum (one which has no values in its variants). +