nomicon/repr-rust.md

% repr(Rust)

Rust gives you the following ways to lay out composite data:

* structs (named product types)
* tuples (anonymous product types)
* arrays (homogeneous product types)
* enums (named sum types -- tagged unions)

An enum is said to be *C-like* if none of its variants have associated data.

For all these, individual fields are aligned to their preferred alignment. For
primitives this is usually equal to their size. For instance, a u32 will be
aligned to a multiple of 32 bits, and a u16 will be aligned to a multiple of 16
bits. Note that some primitives may be emulated on different platforms, and as
such may have strange alignment. For instance, a u64 on x86 may actually be
emulated as a pair of u32s, and thus only have 32-bit alignment.

Composite structures will have a preferred alignment equal to the maximum
of their fields' preferred alignment, and a size equal to a multiple of their
preferred alignment. This ensures that arrays of T can be correctly iterated
by offsetting by their size. So for instance,

```rust
struct A {
    a: u8,
    c: u32,
    b: u16,
}
```

will have a size that is a multiple of 32-bits, and 32-bit alignment.

There is *no indirection* for these types; all data is stored contiguously as you would
expect in C. However with the exception of arrays (which are densely packed and
in-order), the layout of data is not by default specified in Rust. Given the two
following struct definitions:

```rust
struct A {
    a: i32,
    b: u64,
}

struct B {
    x: i32,
    b: u64,
}
```

Rust *does* guarantee that two instances of A have their data laid out in exactly
the same way. However Rust *does not* guarantee that an instance of A has the same
field ordering or padding as an instance of B (in practice there's no *particular*
reason why they wouldn't, other than that its not currently guaranteed).

With A and B as written, this is basically nonsensical, but several other features
of Rust make it desirable for the language to play with data layout in complex ways.

For instance, consider this struct:

```rust
struct Foo<T, U> {
    count: u16,
    data1: T,
    data2: U,
}
```

Now consider the monomorphizations of `Foo<u32, u16>` and `Foo<u16, u32>`. If Rust lays out the
fields in the order specified, we expect it to *pad* the values in the struct to satisfy
their *alignment* requirements. So if Rust didn't reorder fields, we would expect Rust to
produce the following:

```rust,ignore
struct Foo<u16, u32> {
    count: u16,
    data1: u16,
    data2: u32,
}

struct Foo<u32, u16> {
    count: u16,
    _pad1: u16,
    data1: u32,
    data2: u16,
    _pad2: u16,
}
```

The latter case quite simply wastes space. An optimal use of space therefore requires
different monomorphizations to have *different field orderings*.

**Note: this is a hypothetical optimization that is not yet implemented in Rust 1.0**

Enums make this consideration even more complicated. Naively, an enum such as:

```rust
enum Foo {
    A(u32),
    B(u64),
    C(u8),
}
```

would be laid out as:

```rust
struct FooRepr {
    data: u64, // this is *really* either a u64, u32, or u8 based on `tag`
    tag: u8, // 0 = A, 1 = B, 2 = C
}
```

And indeed this is approximately how it would be laid out in general
(modulo the size and position of `tag`). However there are several cases where
such a representation is inefficient. The classic case of this is Rust's
"null pointer optimization". Given a pointer that is known to not be null
(e.g. `&u32`), an enum can *store* a discriminant bit *inside* the pointer
by using null as a special value. The net result is that
`size_of::<Option<&T>>() == size_of::<&T>()`

There are many types in Rust that are, or contain, "not null" pointers such as
`Box<T>`, `Vec<T>`, `String`, `&T`, and `&mut T`. Similarly, one can imagine
nested enums pooling their tags into a single descriminant, as they are by
definition known to have a limited range of valid values. In principle enums can
use fairly elaborate algorithms to cache bits throughout nested types with
special constrained representations. As such it is *especially* desirable that
we leave enum layout unspecified today.
SHARD ALL THE CHAPTERS 10 years ago			`% repr(Rust)`

			`Rust gives you the following ways to lay out composite data:`

			`* structs (named product types)`
			`* tuples (anonymous product types)`
			`* arrays (homogeneous product types)`
			`* enums (named sum types -- tagged unions)`

			`An enum is said to be C-like if none of its variants have associated data.`

			`For all these, individual fields are aligned to their preferred alignment. For`
			`primitives this is usually equal to their size. For instance, a u32 will be`
			`aligned to a multiple of 32 bits, and a u16 will be aligned to a multiple of 16`
clarify casts are checked at compile time 9 years ago			`bits. Note that some primitives may be emulated on different platforms, and as`
			`such may have strange alignment. For instance, a u64 on x86 may actually be`
			`emulated as a pair of u32s, and thus only have 32-bit alignment.`

			`Composite structures will have a preferred alignment equal to the maximum`
SHARD ALL THE CHAPTERS 10 years ago			`of their fields' preferred alignment, and a size equal to a multiple of their`
			`preferred alignment. This ensures that arrays of T can be correctly iterated`
			`by offsetting by their size. So for instance,`

			```rust
			`struct A {`
			`a: u8,`
			`c: u32,`
			`b: u16,`
			`}`
			```

			`will have a size that is a multiple of 32-bits, and 32-bit alignment.`

			`There is no indirection for these types; all data is stored contiguously as you would`
			`expect in C. However with the exception of arrays (which are densely packed and`
			`in-order), the layout of data is not by default specified in Rust. Given the two`
			`following struct definitions:`

			```rust
			`struct A {`
			`a: i32,`
			`b: u64,`
			`}`

			`struct B {`
			`x: i32,`
			`b: u64,`
			`}`
			```

			`Rust does guarantee that two instances of A have their data laid out in exactly`
			`the same way. However Rust does not guarantee that an instance of A has the same`
			`field ordering or padding as an instance of B (in practice there's no particular`
			`reason why they wouldn't, other than that its not currently guaranteed).`

			`With A and B as written, this is basically nonsensical, but several other features`
			`of Rust make it desirable for the language to play with data layout in complex ways.`

			`For instance, consider this struct:`

			```rust
			`struct Foo<T, U> {`
			`count: u16,`
			`data1: T,`
			`data2: U,`
			`}`
			```

			Now consider the monomorphizations of `Foo<u32, u16>` and `Foo<u16, u32>`. If Rust lays out the
			`fields in the order specified, we expect it to pad the values in the struct to satisfy`
			`their alignment requirements. So if Rust didn't reorder fields, we would expect Rust to`
			`produce the following:`

fix all the doc tests 10 years ago			```rust,ignore
SHARD ALL THE CHAPTERS 10 years ago			`struct Foo<u16, u32> {`
			`count: u16,`
			`data1: u16,`
			`data2: u32,`
			`}`

			`struct Foo<u32, u16> {`
			`count: u16,`
			`_pad1: u16,`
			`data1: u32,`
			`data2: u16,`
			`_pad2: u16,`
			`}`
			```

			`The latter case quite simply wastes space. An optimal use of space therefore requires`
			`different monomorphizations to have different field orderings.`

			`Note: this is a hypothetical optimization that is not yet implemented in Rust 1.0`

			`Enums make this consideration even more complicated. Naively, an enum such as:`

			```rust
			`enum Foo {`
			`A(u32),`
			`B(u64),`
			`C(u8),`
			`}`
			```

			`would be laid out as:`

			```rust
			`struct FooRepr {`
			data: u64, // this is really either a u64, u32, or u8 based on `tag`
			`tag: u8, // 0 = A, 1 = B, 2 = C`
			`}`
			```

			`And indeed this is approximately how it would be laid out in general`
			(modulo the size and position of `tag`). However there are several cases where
fix via mdinger 10 years ago			`such a representation is inefficient. The classic case of this is Rust's`
SHARD ALL THE CHAPTERS 10 years ago			`"null pointer optimization". Given a pointer that is known to not be null`
			(e.g. `&u32`), an enum can store a discriminant bit inside the pointer
			`by using null as a special value. The net result is that`
			`size_of::<Option<&T>>() == size_of::<&T>()`

			`There are many types in Rust that are, or contain, "not null" pointers such as`
			`Box<T>`, `Vec<T>`, `String`, `&T`, and `&mut T`. Similarly, one can imagine
			`nested enums pooling their tags into a single descriminant, as they are by`
			`definition known to have a limited range of valid values. In principle enums can`
			`use fairly elaborate algorithms to cache bits throughout nested types with`
			`special constrained representations. As such it is especially desirable that`
fix via mdinger 10 years ago			`we leave enum layout unspecified today.`