From 5c74b691972d36d30e27f253e8e2673d9930316f Mon Sep 17 00:00:00 2001 From: James Munns Date: Fri, 8 Nov 2024 12:51:51 +0100 Subject: [PATCH 1/7] Initial draft of unsafe-set-discriminant --- text/0000-unsafe-set-discriminant.md | 299 +++++++++++++++++++++++++++ 1 file changed, 299 insertions(+) create mode 100644 text/0000-unsafe-set-discriminant.md diff --git a/text/0000-unsafe-set-discriminant.md b/text/0000-unsafe-set-discriminant.md new file mode 100644 index 00000000000..14b4fd1264f --- /dev/null +++ b/text/0000-unsafe-set-discriminant.md @@ -0,0 +1,299 @@ +- Feature Name: unsafe_set_discriminant +- Start Date: 2024-11-08 +- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000) +- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000) + +# Summary +[summary]: #summary + +This RFC proposes a way to write the discriminant of an enum when building it "from scratch". This introduces two new library components, an unsafe `set_discriminant` function, and a `discriminant_of!` macro, which can be used with any enum, regardless of `repr` or any other details. + +# Motivation +[motivation]: #motivation + +At the moment, enums are the only "plain old data" type that cannot be initialized in-place. This is discussed in [this blog post][pods_from_scratch] in more detail, and was [discussed on Zulip]. This RFC is aimed at removing that gap. + +In-place construction is desirable for creating objects at their destination, often "by parts" (one piece or sub-component at a time) removing the need for potential on-stack construction, which can cause issues (particularly for large objects) with stack overflows, increased stack usage, or additional copies that cause performance impact. + +In-place construction is useful for a number of niche applications, including in-place deserialization, construction of items on the heap, pinned initialization (including [in the Linux kernel]) or when using "outptr" patterns (for FFI or otherwise). + +**Simple data types** (integers, booleans, etc.) can be created with `MaybeUninit::uninit` and initialized using pointer operations. + +**Composite data types** (structs, tuples) can also be created with `MaybeUninit::uninit`, and `offset_of!` can be used to initialize each of the subcomponents. + +With [this feature][feature_offset_of_enum] (currently in FCP with disposition merge at the time of writing), it is possible to also use `offset_of!` to **initialize the variant** of an enum in-place, as with structs. + +However, there is no stable way to set the discriminant of an enum, without fully creating the variant and writing it by-value to the destination. This means that any enum value must first be created on the stack, and written through to the final destination, even if the potentially large enum variant could be initialized in place. + +For example, TODAY if we had the type `Option<[u8; 1024 * 1024]>`, and would like to initialize it as `Some([0u8; 1024 * 1024])`, this would require creation of the 1MiB array on the stack, before it is written to its destination. Although there are various workarounds to *hopefully* have the compiler elide this copy, they are not guaranteed. + +With this RFC, it would be possible to zero-initialize the variant (meaning no 1MiB stack creation), and then set the discriminant to `Some`. + +[in the Linux kernel]: https://docs.rs/pinned-init/latest/pinned_init/ +[pods_from_scratch]: https://onevariable.com/blog/pods-from-scratch/ +[discussed on Zulip]: https://rust-lang.zulipchat.com/#narrow/channel/213817-t-lang/topic/Prior.20discussions.20for.20manually.20building.20an.20enum.3F +[feature_offset_of_enum]: https://github.com/rust-lang/rust/issues/120141 + +# Guide-level explanation +[guide-level-explanation]: #guide-level-explanation + +This introduces two new library components, a `discriminant_of!` macro, and an unsafe `set_discriminant` function. + +## The `discriminant_of!` macro + +For manual construction of an enum, it is necessary to obtain the discriminant of an enum, without necessarily having an instance of that enum. + +This RFC proposes a macro that allows for creation of a [`Discriminant`](https://doc.rust-lang.org/stable/std/mem/struct.Discriminant.html), similar to [`discriminant()`](https://doc.rust-lang.org/stable/std/mem/fn.discriminant.html), but without requiring a reference to the enum. + +This would have the following form[^1]: + +```rust +// macro: discriminant_of!(Type, Variant Name); + +// Usable in normal let bindings +let discriminant: Discriminant> + = discriminant_of!(Option, Some); +// ^^^^^^^^^^^ --------> Type +// ^^^^ --> Variant Name + +// Also available in const context +const DISCRIMINANT: Discriminant> + = discriminant_of!(Result, Ok); +// ^^^^^^^^^^^^^^^^^^^ ------> Type +// ^^ --> Variant Name +``` + +If the provided **Type** is not an enum, or the provided **Variant Name** is not present in the list of variants of **Type**, then a compile time error will occur. + +```rust +discriminant_of!(u32, Some); // Not allowed +discriminant_of!(Option, Ok); // Not allowed +``` + +Compiler errors could look like this: + +```sh +discriminant_of!(u32, Some); + ^^^ -> ERROR: type is not an enum + +discriminant_of!(Option, Ok); + ^^ -> ERROR: type Option does not contain + the variant "Ok". Variants are "Some" or + "None". +``` + +[^1]: The syntax of [enum variant offsets][feature_offset_of_enum] is still in discussion, this RFC would adopt whatever syntax decisions made by that feature for consistency. The currently described syntax of that feature is used in this RFC as a placeholder. + +## The `set_discriminant` function + +This function is used for setting the discriminant of an enum, and has the following signature: + +```rust +// likely in std/core::mem +pub unsafe fn set_discriminant( + &mut MaybeUninit, // The enum being constructed + Discriminant, // The discriminant being set +); +``` + +This function MUST be called AT LEAST ONCE, AFTER fully initializing the values of the variant associated with this discriminant, and MUST be called BEFORE calling `assume_init` on the `MaybeUninit` enum. + +This could be used as follows: + +```rust +let mut out: MaybeUninit> = MaybeUninit::uninit(); +// Make a decision on runtime values +let init_some: Option = get_user_input(); + +if let Some(val) = init_some { + let opt_ptr: *mut Option<[u32; 1024]> = out.as_mut_ptr(); + // Tracking issue #120141 for enum variant offset_of! definition + let val_offset: usize = offset_of!(Option<[u32; 1024]>, Some.0); + let arr_ptr: *mut [u32; 1024] = opt_ptr.byte_add(val_offset).cast(); + let item_ptr: *mut u32 = arr_ptr.cast(); + // Initialize all items in the array to the user provided value + for i in 0..1024 { + // SAFETY: The variant body is always well aligned and valid for + // the size of the type, uninit fields are only written. + unsafe { + item_ptr.add(i).write(val); + } + } + // Set the discriminant + let discrim = discriminant_of!(Option<[u32; 1024]>, Some); + // SAFETY: We have initialized all fields for this variant, and + // this discriminant is correct for the type we are writing to. + unsafe { + set_discriminant(&mut out, discrim); + } +} else { + // No value to write, just set the discriminant, leaving the + // rest of the value uninitialized + // + // SAFETY: We have initialized all fields for this variant + // (which is 'no fields'), and this discriminant is correct + // for the type we are writing to. + unsafe { + // We can also use discriminant_of! without binding it + set_discriminant( + &mut out, + discriminant_of!(Option<[u32; 1024]>, None), + ); + } +} + +// This is now sound. We could also use `assume_init_ref` or +// `assume_init_mut` if we are explicitly avoiding potential +// copies by value, and the allocation is not local. +let out: Option<[u32; 1024]> = unsafe { out.assume_init() }; +assert_eq!(out.is_some(), init_some.is_some()); +if let Some(val) = init_some { + assert!(out.as_ref().unwrap().iter().all(|x| *x == val)); +} +``` + +# Reference-level explanation +[reference-level-explanation]: #reference-level-explanation + +## The `discriminant_of!` macro + +This macro is the "valueless" version of `std::mem::discriminant`. It needs to be a macro, as we cannot otherwise name the variant. + +Unlike the `discriminant` function, it is possible at compile time to detect if this is called on a type that is not an `enum`, meaning that it is possible for it to be a compiler error in this case, rather than "unspecified but not UB" to use this with a non-enum type. + +This macro returns by value the same opaque `Discriminant` used by `std::mem::discriminant`. + +This RFC does not propose allowing obtaining the discriminant of "nested" fields, and will only work with a "top level" item, but does not preclude adding this capability in the future. This addition could be specified in a later RFC. For now users are recommended to handle "one level at a time". + +```rust +discriminant_of!(Option>, Some); // Allowed +discriminant_of!(Option>, Some.0.Some); // Not Allowed (yet) +``` + +## The `set_discriminant` function + +This function must be unsafe, as setting an incorrect discriminant could lead to undefined behavior by allowing safe code to observe incorrectly or incompletely initialized values. + +This function takes an `&mut MaybeUninit` as it fulfills multiple requirements necessary for safety: + +* The enum and the selected variant's fields must be well-aligned for reading and writing +* The enum is not fully initialized, as its discriminant may not have been set prior to calling this function +* This function is allowed to write and read-back both the discriminant and body (whether they each exist or not, and whether the discriminant and body are separate or not). This function also may do this in an unsynchronized manner (not requiring locks or atomic operations), which means an exclusive `&mut` is appropriate + +If the `T` used for `&mut MaybeUninit` or `Discriminant` when calling this function is NOT an enum, then this function is specified as a no-op. This is not undefined behavior, but later calls to `assume_init` may be undefined behavior if the `T` was not properly initialized. This is also allowed (but not required) to cause a debug assertion failure to aid in testing (implementor's choice). + +This function MUST be called AFTER fully initializing the variant of the enum completely, as in some cases it may be necessary to read these values. This is discussed more in the next section. + +## Interactions with niche-packed types + +This RFC is intended to work with ANY enum types, including those with niche representations. This includes types like `Option<&u32>` or `Option`, where excluded values (like `null`) are used for the `None` variant. + +In these cases, the discriminant and variant body are fully overlapping, rather than being independant memory locations. + +This RFC STILL requires users to explicitly call `set_discriminant`, even if the act of setting the value also sets the discriminant implicitly, for example by writing `null` to the body of `Option<&u32>` via pointer methods or casting. + +```rust +let refr: &u32 = &123u32; +let mut out: MaybeUninit> = MaybeUninit::uninit(); + +let opt_ptr: *mut Option<&u32> = out.as_mut_ptr(); +// Tracking issue #120141 for enum variant offset_of! definition +let val_offset: usize = offset_of!(Option<&u32>, Some.0); +let val_ptr: *mut &u32 = opt_ptr.byte_add(val_offset).cast(); + +unsafe { + // Sets the value of the body + val_ptr.write(refr); + // Technically does nothing, but is still required + set_discriminant( + &mut out, + discriminant_of!(Option<&u32>, Some), + ); +} +let out: Option<&u32> = unsafe { out.assume_init() }; +assert_eq!(out, Some(refr)); +``` + +In these cases, `set_discriminant` would be responsible for also initializing the whole body of the variant. For example, this would be sufficient initialization: + +```rust +let mut out: MaybeUninit> = MaybeUninit::uninit(); +unsafe { + set_discriminant( + &mut out, + discriminant_of!(Option<&u32>, None), + ); +} +let out: Option<&u32> = unsafe { out.assume_init() }; +assert_eq!(out, None); +``` + +# Drawbacks +[drawbacks]: #drawbacks + +* This RFC increases the API surface of the standard library +* This RFC adds unsafe methods which must be considered, documented, and tested +* The abilities of this RFC could technically already be done today, either for carefully selected subsets of enums (specific reprs, hand-built enums using explicit discriminant and `union`'d value fields), or be done today less efficiently with explicit value creations and copies +* The features added to this RFC generally only benefit "power users", likely library authors already doing unsafe things + +# Rationale and alternatives +[rationale-and-alternatives]: #rationale-and-alternatives + +This RFC is necessary to "complete" the ability to soundly create enums in-place. + +Without this capability, downstream library authors are required to make a choice between: + +* Suboptimal stack usage and performance, in the case of by-value initializations +* Restriction of functionality to a subset of enum reprs, or bespoke proxy types that emulate enums to allow for in-place construction + +This is necessary to be part of the language and/or standard library as enum representations (at least for the default `repr(Rust)`) are unspecified, and particularly with concepts like niche-packed enums, it is not possible to soundly handle this in the general case of user defined enum types. + +I am not aware of any other proposed designs for this functionality. + +# Prior art +[prior-art]: #prior-art + +The `rkyv` crate defines [proxy enums] with explict `#[repr(N)]` (where N is an unsigned integer) types. + +[proxy enums]: https://rkyv.org/format.html + +Syntax for `discriminant_of!` is inspired by the in-progress [offset_of_enum feature][feature_offset_of_enum]. + +# Unresolved questions +[unresolved-questions]: #unresolved-questions + +## Should we use `&mut MaybeUninit` for `set_discriminant`? + +There was some discussion when writing this RFC what the type of the first argument to `set_discriminant` should be: + +* `&mut MaybeUninit` +* `NonNull` +* `*mut T` + +From the perspective of "making expectations clear", I think `&mut MaybeUninit` is the best choice, as it makes the requirements around alignment, exclusive access, etc. enumerated above explicit. + +However, it is possible that when constructing multiple nested types in-place, the user could be creating the pointers to the enum using `offset_of!` or `addr_of_mut!`, which would require the user to do some casting to turn their `*mut ()` (or any other `void*` analog) into a maybeuninit just to create an exclusive reference to it. + +We could use a "weaker" pointer type like `NonNull` or `*mut T` instead, and cover the requirements in the "SAFETY" comment blocks, rather than "inheriting" these requirements from `&mut MaybeUninit`. + +We could also have multiple versions of `set_discriminant`, like `set_discriminant_nonnull` or `set_discriminant_ptr` for these. + +# Future possibilities +[future-possibilities]: #future-possibilities + +## Should `discriminant_of!` support nested usage? + +Discussed above, we said this was not allowed: + +```rust +discriminant_of!(Option>, Some); // Allowed +discriminant_of!(Option>, Some.0.Some); // Not Allowed (yet) +``` + +Should we allow this in the future? + +## Should we support construction of unaligned enums? + +If the user is writing to a packed format, they could potentially want the ability to set the discriminant in cases where the enum discriminant or fields are not well-aligned. + +This could require the creation of a `set_discriminant_unaligned` function, that relaxes the well-aligned safety requirements of the proposed `set_discriminant`. From 23d467341476d64f2eaf679cc445e929097d7f81 Mon Sep 17 00:00:00 2001 From: James Munns Date: Fri, 8 Nov 2024 12:57:10 +0100 Subject: [PATCH 2/7] Assign RFC ID --- ...safe-set-discriminant.md => 3727-unsafe-set-discriminant.md} | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) rename text/{0000-unsafe-set-discriminant.md => 3727-unsafe-set-discriminant.md} (99%) diff --git a/text/0000-unsafe-set-discriminant.md b/text/3727-unsafe-set-discriminant.md similarity index 99% rename from text/0000-unsafe-set-discriminant.md rename to text/3727-unsafe-set-discriminant.md index 14b4fd1264f..e0db7eb08ae 100644 --- a/text/0000-unsafe-set-discriminant.md +++ b/text/3727-unsafe-set-discriminant.md @@ -1,6 +1,6 @@ - Feature Name: unsafe_set_discriminant - Start Date: 2024-11-08 -- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000) +- RFC PR: [rust-lang/rfcs#3727](https://github.com/rust-lang/rfcs/pull/3727) - Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000) # Summary From 557d1f3d273a21379797688ef90dcf0cf7b03ba9 Mon Sep 17 00:00:00 2001 From: James Munns Date: Mon, 11 Nov 2024 11:44:50 +0100 Subject: [PATCH 3/7] Update text/3727-unsafe-set-discriminant.md Co-authored-by: Travis Cross --- text/3727-unsafe-set-discriminant.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/3727-unsafe-set-discriminant.md b/text/3727-unsafe-set-discriminant.md index e0db7eb08ae..53a66480eb1 100644 --- a/text/3727-unsafe-set-discriminant.md +++ b/text/3727-unsafe-set-discriminant.md @@ -21,7 +21,7 @@ In-place construction is useful for a number of niche applications, including in **Composite data types** (structs, tuples) can also be created with `MaybeUninit::uninit`, and `offset_of!` can be used to initialize each of the subcomponents. -With [this feature][feature_offset_of_enum] (currently in FCP with disposition merge at the time of writing), it is possible to also use `offset_of!` to **initialize the variant** of an enum in-place, as with structs. +With [this feature][feature_offset_of_enum] (currently proposed for FCP merge at the time of writing), it is possible to also use `offset_of!` to **initialize the variant** of an enum in-place, as with structs. However, there is no stable way to set the discriminant of an enum, without fully creating the variant and writing it by-value to the destination. This means that any enum value must first be created on the stack, and written through to the final destination, even if the potentially large enum variant could be initialized in place. From e77711c2ae348bb6adcb0ba0c6195f3b5de55d63 Mon Sep 17 00:00:00 2001 From: James Munns Date: Mon, 11 Nov 2024 12:58:32 +0100 Subject: [PATCH 4/7] Address review comments --- text/3727-unsafe-set-discriminant.md | 79 +++++++++++++++++----------- 1 file changed, 48 insertions(+), 31 deletions(-) diff --git a/text/3727-unsafe-set-discriminant.md b/text/3727-unsafe-set-discriminant.md index 53a66480eb1..a44938d0967 100644 --- a/text/3727-unsafe-set-discriminant.md +++ b/text/3727-unsafe-set-discriminant.md @@ -23,7 +23,7 @@ In-place construction is useful for a number of niche applications, including in With [this feature][feature_offset_of_enum] (currently proposed for FCP merge at the time of writing), it is possible to also use `offset_of!` to **initialize the variant** of an enum in-place, as with structs. -However, there is no stable way to set the discriminant of an enum, without fully creating the variant and writing it by-value to the destination. This means that any enum value must first be created on the stack, and written through to the final destination, even if the potentially large enum variant could be initialized in place. +However, there is no stable way to set the discriminant of an enum in the general case, without fully creating the variant and writing it by-value to the destination. This means that any enum value must first be created on the stack, and written through to the final destination, even if the potentially large enum variant could be initialized in place. For example, TODAY if we had the type `Option<[u8; 1024 * 1024]>`, and would like to initialize it as `Some([0u8; 1024 * 1024])`, this would require creation of the 1MiB array on the stack, before it is written to its destination. Although there are various workarounds to *hopefully* have the compiler elide this copy, they are not guaranteed. @@ -74,7 +74,7 @@ Compiler errors could look like this: ```sh discriminant_of!(u32, Some); - ^^^ -> ERROR: type is not an enum + ^^^ -> ERROR: "u32" is not an enum discriminant_of!(Option, Ok); ^^ -> ERROR: type Option does not contain @@ -89,10 +89,9 @@ discriminant_of!(Option, Ok); This function is used for setting the discriminant of an enum, and has the following signature: ```rust -// likely in std/core::mem pub unsafe fn set_discriminant( - &mut MaybeUninit, // The enum being constructed - Discriminant, // The discriminant being set + *mut T, // The enum being constructed + Discriminant, // The discriminant being set ); ``` @@ -124,7 +123,7 @@ if let Some(val) = init_some { // SAFETY: We have initialized all fields for this variant, and // this discriminant is correct for the type we are writing to. unsafe { - set_discriminant(&mut out, discrim); + set_discriminant(out.as_mut_ptr(), discrim); } } else { // No value to write, just set the discriminant, leaving the @@ -136,7 +135,7 @@ if let Some(val) = init_some { unsafe { // We can also use discriminant_of! without binding it set_discriminant( - &mut out, + out.as_mut_ptr(), discriminant_of!(Option<[u32; 1024]>, None), ); } @@ -167,30 +166,41 @@ This RFC does not propose allowing obtaining the discriminant of "nested" fields ```rust discriminant_of!(Option>, Some); // Allowed -discriminant_of!(Option>, Some.0.Some); // Not Allowed (yet) +discriminant_of!(Option>, Some.0.Some); // Not Allowed ``` ## The `set_discriminant` function This function must be unsafe, as setting an incorrect discriminant could lead to undefined behavior by allowing safe code to observe incorrectly or incompletely initialized values. -This function takes an `&mut MaybeUninit` as it fulfills multiple requirements necessary for safety: +This function takes an `*mut T` and has multiple requirements necessary for safety: -* The enum and the selected variant's fields must be well-aligned for reading and writing -* The enum is not fully initialized, as its discriminant may not have been set prior to calling this function -* This function is allowed to write and read-back both the discriminant and body (whether they each exist or not, and whether the discriminant and body are separate or not). This function also may do this in an unsynchronized manner (not requiring locks or atomic operations), which means an exclusive `&mut` is appropriate +* The pointer `*mut T` must be non-null +* The enum `T` and the selected variant's fields must be well-aligned for reading and writing +* This function is allowed to write and read-back both the discriminant and body (whether they each exist or not, and whether the discriminant and body are separate or not). This function also may do this in an unsynchronized manner (not requiring locks or atomic operations), which means an exclusive access is required -If the `T` used for `&mut MaybeUninit` or `Discriminant` when calling this function is NOT an enum, then this function is specified as a no-op. This is not undefined behavior, but later calls to `assume_init` may be undefined behavior if the `T` was not properly initialized. This is also allowed (but not required) to cause a debug assertion failure to aid in testing (implementor's choice). +If the `T` used for `*mut T` or `Discriminant` when calling this function is NOT an enum, then this function is specified as a no-op. This is not undefined behavior, but later calls to `assume_init` may be undefined behavior if the `T` was not properly initialized. This is also allowed (but not required) to cause a debug assertion failure to aid in testing (implementor's choice). -This function MUST be called AFTER fully initializing the variant of the enum completely, as in some cases it may be necessary to read these values. This is discussed more in the next section. +When this function is called, it MUST be called AFTER fully initializing the variant of the enum completely, as in some cases it may be necessary to read back these values. This is discussed more in the next section. + +Semantically, `set_discriminant` is specified to optionally write the discriminant (when necessary), and read-back the discriminant. If the read-back discriminant does not match the expected value, then the behavior is undefined. + +### Alternate forms of `set_discriminant` + +This RFC only specifies one form of `set_discriminant`, which takes an `*mut T` to the enum. It is expected that additional versions of this function will also be added for convenience in the future, for example taking other pointer types like: + +* `unsafe fn set_discriminant_mu(&mut MaybeUninit, Discriminant);` +* `unsafe fn set_discriminant_nn(NonNull, Discriminant);` + +These forms are not specified as part of this RFC, and can be added without an additional RFC in the future. Alternate forms are expected to match the semantics of the pointer-based version of `set_discriminant`. ## Interactions with niche-packed types This RFC is intended to work with ANY enum types, including those with niche representations. This includes types like `Option<&u32>` or `Option`, where excluded values (like `null`) are used for the `None` variant. -In these cases, the discriminant and variant body are fully overlapping, rather than being independant memory locations. +In these cases, the discriminant and variant body are fully overlapping, rather than being independent memory locations. -This RFC STILL requires users to explicitly call `set_discriminant`, even if the act of setting the value also sets the discriminant implicitly, for example by writing `null` to the body of `Option<&u32>` via pointer methods or casting. +This RFC recommends users to explicitly call `set_discriminant`, even if the act of setting the value also sets the discriminant implicitly, for example by writing `null` to the body of `Option<&u32>` via pointer methods or casting. ```rust let refr: &u32 = &123u32; @@ -204,9 +214,9 @@ let val_ptr: *mut &u32 = opt_ptr.byte_add(val_offset).cast(); unsafe { // Sets the value of the body val_ptr.write(refr); - // Technically does nothing, but is still required + // Does not affect the contents of `out` set_discriminant( - &mut out, + out.as_mut_ptr(), discriminant_of!(Option<&u32>, Some), ); } @@ -214,13 +224,13 @@ let out: Option<&u32> = unsafe { out.assume_init() }; assert_eq!(out, Some(refr)); ``` -In these cases, `set_discriminant` would be responsible for also initializing the whole body of the variant. For example, this would be sufficient initialization: +In the case of the niche variant, `set_discriminant` would be responsible for also initializing the whole body of the variant. For example, this would be sufficient initialization: ```rust let mut out: MaybeUninit> = MaybeUninit::uninit(); unsafe { set_discriminant( - &mut out, + out.as_mut_ptr(), discriminant_of!(Option<&u32>, None), ); } @@ -228,6 +238,19 @@ let out: Option<&u32> = unsafe { out.assume_init() }; assert_eq!(out, None); ``` +## Specifically known types vs unknown types + +This RFC does not invalidate any currently-accepted ways of initializing enums manually without these new functions. For example, enums with a [primitive representation], or niche represented enums with [discriminant elision], can be soundly created today. + +When specific types with these qualities are used, it is not required, but still allowed, to use the `set_discriminant` function to set the discriminant. + +However, when authoring generic or macro code, which may potentially accept types that do not have these qualities, it is required to call `set_discriminant` to fully initialize the enum in the general case. + +For example, if a macro was used to populate an `Option` value, and users could chose `U: u32` (which does not have a niche repr), OR `U: &u32` (which does have a niche repr), then the author of the macro should call `set_discriminant` to soundly initialize the `Option` in all cases. + +[primitive representation]: https://doc.rust-lang.org/reference/items/enumerations.html#pointer-casting +[discriminant elision]: https://rust-lang.github.io/unsafe-code-guidelines/layout/enums.html#discriminant-elision-on-option-like-enums + # Drawbacks [drawbacks]: #drawbacks @@ -243,7 +266,7 @@ This RFC is necessary to "complete" the ability to soundly create enums in-place Without this capability, downstream library authors are required to make a choice between: -* Suboptimal stack usage and performance, in the case of by-value initializations +* Suboptimal stack usage and performance, in the case of by-value initialization * Restriction of functionality to a subset of enum reprs, or bespoke proxy types that emulate enums to allow for in-place construction This is necessary to be part of the language and/or standard library as enum representations (at least for the default `repr(Rust)`) are unspecified, and particularly with concepts like niche-packed enums, it is not possible to soundly handle this in the general case of user defined enum types. @@ -253,7 +276,7 @@ I am not aware of any other proposed designs for this functionality. # Prior art [prior-art]: #prior-art -The `rkyv` crate defines [proxy enums] with explict `#[repr(N)]` (where N is an unsigned integer) types. +The `rkyv` crate defines [proxy enums] with explict `#[repr(N)]` (where N is an unsigned integer) types, in order to allow in-place construction and access [proxy enums]: https://rkyv.org/format.html @@ -262,7 +285,7 @@ Syntax for `discriminant_of!` is inspired by the in-progress [offset_of_enum fea # Unresolved questions [unresolved-questions]: #unresolved-questions -## Should we use `&mut MaybeUninit` for `set_discriminant`? +## Should we provide alternate forms of `set_discriminant`? There was some discussion when writing this RFC what the type of the first argument to `set_discriminant` should be: @@ -270,13 +293,7 @@ There was some discussion when writing this RFC what the type of the first argum * `NonNull` * `*mut T` -From the perspective of "making expectations clear", I think `&mut MaybeUninit` is the best choice, as it makes the requirements around alignment, exclusive access, etc. enumerated above explicit. - -However, it is possible that when constructing multiple nested types in-place, the user could be creating the pointers to the enum using `offset_of!` or `addr_of_mut!`, which would require the user to do some casting to turn their `*mut ()` (or any other `void*` analog) into a maybeuninit just to create an exclusive reference to it. - -We could use a "weaker" pointer type like `NonNull` or `*mut T` instead, and cover the requirements in the "SAFETY" comment blocks, rather than "inheriting" these requirements from `&mut MaybeUninit`. - -We could also have multiple versions of `set_discriminant`, like `set_discriminant_nonnull` or `set_discriminant_ptr` for these. +`*mut T` was chosen as the most general option, however it is likely desirable to accept other forms as well for convenience. # Future possibilities [future-possibilities]: #future-possibilities @@ -287,7 +304,7 @@ Discussed above, we said this was not allowed: ```rust discriminant_of!(Option>, Some); // Allowed -discriminant_of!(Option>, Some.0.Some); // Not Allowed (yet) +discriminant_of!(Option>, Some.0.Some); // Not Allowed ``` Should we allow this in the future? From f1a12246d093a532692be7311670ec8a000328f4 Mon Sep 17 00:00:00 2001 From: James Munns Date: Mon, 11 Nov 2024 13:01:08 +0100 Subject: [PATCH 5/7] Add note re: unaligned discriminant reads as well. --- text/3727-unsafe-set-discriminant.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/text/3727-unsafe-set-discriminant.md b/text/3727-unsafe-set-discriminant.md index a44938d0967..1a7c468c267 100644 --- a/text/3727-unsafe-set-discriminant.md +++ b/text/3727-unsafe-set-discriminant.md @@ -314,3 +314,5 @@ Should we allow this in the future? If the user is writing to a packed format, they could potentially want the ability to set the discriminant in cases where the enum discriminant or fields are not well-aligned. This could require the creation of a `set_discriminant_unaligned` function, that relaxes the well-aligned safety requirements of the proposed `set_discriminant`. + +There is also currently no way to read the discriminant of an unaligned enum, so it may also be necessary to add unaligned versions of the `discriminant()` function as well. From 1ce17abbd5e37de4321864e8f26784526603c279 Mon Sep 17 00:00:00 2001 From: James Munns Date: Mon, 11 Nov 2024 13:12:31 +0100 Subject: [PATCH 6/7] A few slight wording fix-ups --- text/3727-unsafe-set-discriminant.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/text/3727-unsafe-set-discriminant.md b/text/3727-unsafe-set-discriminant.md index 1a7c468c267..81e726e3d99 100644 --- a/text/3727-unsafe-set-discriminant.md +++ b/text/3727-unsafe-set-discriminant.md @@ -95,7 +95,7 @@ pub unsafe fn set_discriminant( ); ``` -This function MUST be called AT LEAST ONCE, AFTER fully initializing the values of the variant associated with this discriminant, and MUST be called BEFORE calling `assume_init` on the `MaybeUninit` enum. +This function MUST be called AFTER fully initializing the values of the variant associated with this discriminant, and when called, MUST be called BEFORE calling `assume_init` on the `MaybeUninit` enum. This could be used as follows: @@ -177,9 +177,9 @@ This function takes an `*mut T` and has multiple requirements necessary for safe * The pointer `*mut T` must be non-null * The enum `T` and the selected variant's fields must be well-aligned for reading and writing -* This function is allowed to write and read-back both the discriminant and body (whether they each exist or not, and whether the discriminant and body are separate or not). This function also may do this in an unsynchronized manner (not requiring locks or atomic operations), which means an exclusive access is required +* This function is allowed to write and read-back both the discriminant and body (whether they each exist or not, and whether the discriminant and body are separate or not). This function also may do this in an unsynchronized manner (not requiring locks or atomic operations), which means exclusive access is required -If the `T` used for `*mut T` or `Discriminant` when calling this function is NOT an enum, then this function is specified as a no-op. This is not undefined behavior, but later calls to `assume_init` may be undefined behavior if the `T` was not properly initialized. This is also allowed (but not required) to cause a debug assertion failure to aid in testing (implementor's choice). +If the `T` used for `*mut T` or `Discriminant` when calling this function is NOT an enum, then this function is specified as a no-op. This is not undefined behavior, but later calls to `assume_init` or usage of the value `T` may be undefined behavior if the `T` was not properly initialized. This is also allowed (but not required) to cause a debug assertion failure to aid in testing (implementor's choice). When this function is called, it MUST be called AFTER fully initializing the variant of the enum completely, as in some cases it may be necessary to read back these values. This is discussed more in the next section. @@ -187,7 +187,7 @@ Semantically, `set_discriminant` is specified to optionally write the discrimina ### Alternate forms of `set_discriminant` -This RFC only specifies one form of `set_discriminant`, which takes an `*mut T` to the enum. It is expected that additional versions of this function will also be added for convenience in the future, for example taking other pointer types like: +This RFC only specifies one form of `set_discriminant`, which takes an `*mut T` to the enum `T`. It is expected that additional versions of this function will also be added for convenience in the future, for example taking other pointer types like: * `unsafe fn set_discriminant_mu(&mut MaybeUninit, Discriminant);` * `unsafe fn set_discriminant_nn(NonNull, Discriminant);` @@ -244,7 +244,7 @@ This RFC does not invalidate any currently-accepted ways of initializing enums m When specific types with these qualities are used, it is not required, but still allowed, to use the `set_discriminant` function to set the discriminant. -However, when authoring generic or macro code, which may potentially accept types that do not have these qualities, it is required to call `set_discriminant` to fully initialize the enum in the general case. +However, when authoring generic or macro code, which may potentially accept types that do not have these qualities, it is necessary to call `set_discriminant` to fully initialize the enum in the general case. For example, if a macro was used to populate an `Option` value, and users could chose `U: u32` (which does not have a niche repr), OR `U: &u32` (which does have a niche repr), then the author of the macro should call `set_discriminant` to soundly initialize the `Option` in all cases. From eb486ef529eea98b7d2d9b0ffcd2254ac16a4ea0 Mon Sep 17 00:00:00 2001 From: James Munns Date: Mon, 11 Nov 2024 13:16:09 +0100 Subject: [PATCH 7/7] A few more wording fix-ups --- text/3727-unsafe-set-discriminant.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/text/3727-unsafe-set-discriminant.md b/text/3727-unsafe-set-discriminant.md index 81e726e3d99..9957d44e648 100644 --- a/text/3727-unsafe-set-discriminant.md +++ b/text/3727-unsafe-set-discriminant.md @@ -95,7 +95,7 @@ pub unsafe fn set_discriminant( ); ``` -This function MUST be called AFTER fully initializing the values of the variant associated with this discriminant, and when called, MUST be called BEFORE calling `assume_init` on the `MaybeUninit` enum. +This function MUST be called AFTER fully initializing the values of the variant associated with this discriminant, and when called, MUST be called BEFORE use of the enum, for example calling `assume_init` on a `MaybeUninit` or creation of a reference from a pointer to the uninitialized enum. This could be used as follows: @@ -118,8 +118,10 @@ if let Some(val) = init_some { item_ptr.add(i).write(val); } } - // Set the discriminant + // Obtain the discriminant let discrim = discriminant_of!(Option<[u32; 1024]>, Some); + // Set the discriminant + // // SAFETY: We have initialized all fields for this variant, and // this discriminant is correct for the type we are writing to. unsafe {