Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking Issue for enum access in offset_of #120141

Open
1 of 3 tasks
GKFX opened this issue Jan 19, 2024 · 35 comments
Open
1 of 3 tasks

Tracking Issue for enum access in offset_of #120141

GKFX opened this issue Jan 19, 2024 · 35 comments
Labels
C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. F-offset_of_enum `#![feature(offset_of_enum)]` I-lang-nominated Nominated for discussion during a lang team meeting. proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. T-lang Relevant to the language team, which will review and decide on the PR/issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@GKFX
Copy link
Contributor

GKFX commented Jan 19, 2024

Feature gate: #![feature(offset_of_enum)]

This is a tracking issue for using enum variants in offset_of. Enum variants themselves do not have an offset within their type, so the macro will not give an offset for them, but their fields do. For example, the standard library uses the offset of the content of the Some variant of Option to implement Option::as_slice. The original RFC for offset_of was rust-lang/rfcs#3308.

Public API

pub macro offset_of($Container:ty, $($fields:expr)+ $(,)?) { ... }

const OFFSET: usize = offset_of!(Option<u32>, Some.0);

Steps / History

Unresolved Questions

Footnotes

  1. https://std-dev-guide.rust-lang.org/feature-lifecycle/stabilization.html

@GKFX GKFX added C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Jan 19, 2024
@est31 est31 added the F-offset_of `#![feature(offset_of)]` label Jan 19, 2024
@fmease fmease added F-offset_of_enum `#![feature(offset_of_enum)]` and removed F-offset_of `#![feature(offset_of)]` labels Feb 17, 2024
@jamesmunns
Copy link
Member

Noting one potential limitation to offset_of with enums: without a way to set the discriminant, it cannot be used to manually initialize an enum.

Blog post explainer of what I mean: https://onevariable.com/blog/pods-from-scratch/

Zulip thread for discussion: https://rust-lang.zulipchat.com/#narrow/stream/213817-t-lang/topic/Prior.20discussions.20for.20manually.20building.20an.20enum.3F/near/441992612

@dingxiangfei2009
Copy link
Contributor

dingxiangfei2009 commented Jun 11, 2024

Stabilization Report - Abstract

offset_of! macro is a macro at expression location and it computes the usize offset of, possibly nested, fields in the source type. This report concerns the use of this macro to compute offsets into enums meaningfully, namely covering support tracked by the following two issues.

Feature Summary

The macro computes field offsets for both structs and enums, respecting field privacy.

This macro accepts two argument. The first is the source type that

// crate a
#![feature(offset_of_enum)]
#![feature(offset_of_nested)]

struct AStruct {
    pub field: u8,
    field2: u32,
}

enum AEnum {
    Variant1(u8),
    Variant2 {
        field: u8
    },
    Variant3(AStruct),
}

struct ANestingStruct(AStruct);

// readings as of Rust 1.78.0
// DO NOT EXPECT THESE ASSERTIONS TO WORK ACROSS VERSIONS
fn main() {
    assert_eq!(offset_of!(AStruct, field), 4);
    assert_eq!(offset_of!(AStruct, field2), 0);
    assert_eq!(offset_of!(AEnum, Variant2.field), 1);
    assert_eq!(offset_of!(AEnum, Variant3.0.field), 8);
    assert_eq!(offset_of!(ANestingStruct, 0.field), 4);
}

By respecting privacy, suppose that in another crate b

// crate b
#![feature(offset_of_enum)]
#![feature(offset_of_nested)]

fn main() {
    // these still work
    assert_eq!(offset_of!(AStruct, field), 4);
    assert_eq!(offset_of!(AEnum, Variant2.field), 1);
    assert_eq!(offset_of!(AEnum, Variant3.0.field), 8);
    // this does not compile because field2 is private here
    // assert_eq!(offset_of!(AStruct, field2), 0);
    // this does not compile because 0 is private here
    // assert_eq!(offset_of!(ANestingStruct, 0.field), 4);
}

Documentation

The rustdoc of our core library has been updated with code example enabled by these two feature gates.

Test cases and edge cases

The basic test cases for enums and nesting structures is covered by the following tests, including edge cases for parsing rule involving the field specifiction part of offset_of! macro calls.

Unresolved questions

There has been discussion around in-place construction of enum values and the possibility to know the location of discriminant with offset_of! and manipulate discriminant value. This is still under inception phase and will possibly be supported by a separate utility or interface for safety and better UX.

@dingxiangfei2009
Copy link
Contributor

I discovered that in feature(generic_const_exprs) the support for THIR + MIR OffsetOf is sort of half-baked.

Given that generic_const_exprs is incomplete, the only remaining concern is to use it meaningfully at anonymous constant context, or otherwise it should not block stabilization of it for use in const fn which is a viable substitute for a number of scenarios that I can think of. In any case, I am taking a small initiative to tackle it. 🤞

@Amanieu Amanieu added the T-lang Relevant to the language team, which will review and decide on the PR/issue. label Jun 17, 2024
@Amanieu
Copy link
Member

Amanieu commented Jun 17, 2024

@rfcbot fcp merge

@rfcbot
Copy link

rfcbot commented Jun 17, 2024

Team member @Amanieu has proposed to merge this. The next step is review by the rest of the tagged team members:

Concerns:

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

cc @rust-lang/lang-advisors: FCP proposed for lang, please feel free to register concerns.
See this document for info about what commands tagged team members can give me.

@rfcbot rfcbot added proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. labels Jun 17, 2024
@Amanieu
Copy link
Member

Amanieu commented Jun 17, 2024

There has been discussion around in-place construction of enum values and the possibility to know the location of discriminant with offset_of! and manipulate discriminant value. This is still under inception phase and will possibly be supported by a separate utility or interface for safety and better UX.

I don't think this is a blocker: the discriminant isn't encoded as a real field with an offset, so it can't be represented with offset_of anyways.

@joshtriplett
Copy link
Member

@rfcbot concern consistent-future-syntax

Given that this item (unlike nested fields) isn't as urgent, and that we don't have any syntax for this elsewhere in the language (unlike nested fields), I'd like to make sure that either:

  1. The syntax we choose here is consistent with any future syntax we might want to use elsewhere in the language, or
  2. We don't actually want to provide any syntax for this elsewhere in the language, so we're fine with inventing a syntax for this here.

(We currently have pattern syntax for extracting fields, but the syntax proposed for offset_of here is not pattern syntax. That doesn't necessarily mean it should be, but that means we're introducing a new syntax here and we should decide if that's the syntax we want.)

@GKFX
Copy link
Contributor Author

GKFX commented Jun 19, 2024

elsewhere in the language

The same syntax could be used for the proposed field_of! macro, RFC 3318. It could possibly be used for the proposed discriminant_of!, although this seems less necessary as supporting discriminant_of!(Enum, Variant) alone would likely be sufficient. A type_of!(Type, fields) has also been suggested.

Outside of reflection-related macros, the syntax o.Some.0 (for an Option o) implies to me accessing the Some variant of an enum without doing pattern matching. This would mostly likely be an unsafe operation (unless the user had somehow previously proved to the compiler that o is a Some). Unsafely accessing an enum variant can already be done with let Some(o) = o else { unreachable_unchecked() };, and is unusual enough that I doubt it will ever need a dedicated syntax. Alternatively, o.Some.0 could be understood to be in some way fallible, like a more concise alternative to pattern-matching, but again that seems unlikely to actually happen.

@joshtriplett joshtriplett added the I-lang-nominated Nominated for discussion during a lang team meeting. label Aug 6, 2024
@joshtriplett
Copy link
Member

Marking lang nominated to discuss how we want the future landscape to look here, and how we want to provide this functionality. (There's enough of a use case for this; the question is syntax.)

@jamesmunns
Copy link
Member

Hey @joshtriplett, is this scheduled for a meeting agenda? If possible, I'd love to observe that meeting! I'm still very interested in the ability to construct enums in-place for deserialization reasons

@joshtriplett
Copy link
Member

It'll be in an upcoming meeting, but we don't have it scheduled for a particular date. cc @traviscross who would have a better idea of timing.

@exrok
Copy link

exrok commented Sep 18, 2024

With feature(offset_of_enum) alone it's possible to construct enums in-place with atleast opt-level=1 (for dead store elimination).

At opt-level=0 there are extra copies to the stack & back but that's pretty normal for opt-level=0.

See: https://rust.godbolt.org/z/jz8qc1Pqv

Basically, we can emulate SetDiscriminant although with slightly different semantics.

In-place construction is done by initializing fields in-place using offset_of just like you would a struct then using something like the following:

/// initializes `dst` to be valid AnEnum::X containing the fields already initialized in `dst`
/// # Safety
/// - `dst` must be properly aligned pointing to an `AnEnum`
/// - `dst` must be valid for writes
/// -  The value pointed at by `dst` must have all fields of variant `X` initialized and valid.
unsafe fn set_discriminant_x(dst: NonNull<()>) {
    // All the reads that immediately write the same value to the same location
    // get eliminated with alteast `opt-level=1` leaving only the discriminant write, if any.
    let value = AnEnum::X {
        a: dst.byte_add(offset_of!(AnEnum, X.a)).cast::<u32>().read(),
        b: dst.byte_add(offset_of!(AnEnum, X.b)).cast::<u64>().read(),
        c: dst.byte_add(offset_of!(AnEnum, X.c)).cast::<Box<str>>().read(),
    }; 
    dst.cast::<AnEnum>().write(value);
}

enum AnEnum {
    X{ a: u32, b: u64, c: Box<str>  },
    Y([u8; 12]),
}

Note: Ideally a derive macro is generating the above function.

By using a function pointer you can set the discriminant in a type erased fashion as well.
Also, even with many fields (20) the optimization consistently occurs in my tests so far.
I believe this should resolve the limitation @jamesmunns faced, I think.

Of course, it would still be great to have dedicated support for manipulating discriminant.
However, I agree that the lack of such support should not block feature(offset_of_enum).

Full Zulip thread about the workaround: https://rust-lang.zulipchat.com/#narrow/stream/136281-t-opsem/topic/.E2.9C.94.20Workarounds.20for.20setting.20enum.20discriminants/near/470331454

@nikomatsakis
Copy link
Contributor

My take:

I don't know any other place in the language where we might want to use this kind of syntax, though I can imagine it (e.g., MIR has similar paths of this kind). If we want certainty, though, the only syntax that we can be SURE will not conflict with anything else would be to use pattern matching.

Have folks explored pattern matching and what that might look like? I was imagining something like "supply a pattern with exactly one binding" (e.g., offset_of!(path @ Some(x))).

@nikomatsakis
Copy link
Contributor

(Personally though I feel reasonably good about the proposed Variant.field syntax, and we could of course deprecate it in the future if it proves to be a mistake...)

@jamesmunns
Copy link
Member

jamesmunns commented Nov 5, 2024

Just confirming what @exrok said - I don't think "set discriminant" functionality should block this FCP, I'm working on a follow-up RFC for that, and that RFC would build on top of this one.

Edit: I've opened a PR for this RFC: rust-lang/rfcs#3727

@steffahn
Copy link
Member

steffahn commented Dec 23, 2024

I think for stabilization, offset_of! for enums should explicitly clarify its interaction with uninhabited field types.

In particular, I seems like this macro (if applicable to repr(Rust) enums) prevents any future introduction of layout optimizations such as the one discussed in rust-lang/unsafe-code-guidelines#443, unless the offset_of! macro comes with very clear & explicit statements banning [at least certain aspects of] usage1 with uninhabited fields and/or uninitialized enums. If repr(Rust) enum layout would ever want to be able to support optimizing away uninhabited variants completely2, then the following would need to be UB:

#![feature(offset_of_enum)]

use std::convert::Infallible;
use std::mem::offset_of;
use std::mem::MaybeUninit;

fn main() {
    let mut u = MaybeUninit::<Option<(u8, Infallible)>>::uninit();
    let p: *mut Option<(u8, Infallible)> = u.as_mut_ptr();
    unsafe {
        let p: *mut u8 = p
            .byte_add(offset_of!(Option<(u8, Infallible)>, Some.0.0))
            .cast();

        p.write(42);
    }
}

Footnotes

  1. either usage of offset_of! itself, or usage of the computed offset to do pointer arithmetic and dereference the result

  2. i.e. including their field data even if it isn’t 0-sized; for more context, click the UCG link

@steffahn
Copy link
Member

Also, perhaps someone could clarify the general process that is being followed here:
Wouldn't offset_of for enums require an RFC? The RFC that's linked here only mentions the enum support in a future possibilities section which is officially (quoting from the template) pretty much just a

good place to "dump ideas", if they are out of scope for the RFC

@RalfJung
Copy link
Member

To be clear, the compiler already optimizes away uninhabited variants to some extent (specifically, they are skipped when assigning niched tags), but currently it does ensure there is sufficient storage even for uninhabited variants. This is the enum version of the fact that (i32, !) has size 4.

So the question is whether one can assume that offset_of! on a field type T will always give an offset O such that O + size_of::<T>() does not exceed the size of the surrounding type. (And furthermore, the ranges covered in this way by different fields of the same variant are disjoint.) I am not sure if we even guarantee that for struct types... though we probably should as unsafe code is likely to already rely on this for field-by-field in-place initialization of a struct. The same argument also applies to enums though; in-place initialization of enums is also something people are trying to enable (see rust-lang/rfcs#3727).


Regarding the procedural question, I think such extensions have been done with team FCPs in the past. However, it is a user-visible change, and in particular given the subtle question around whether the offsets are guaranteed to be in-bounds, an RFC might not be a bad idea -- it'd probably make sense to discuss that together with rust-lang/rfcs#3727.

@hanna-kruppe
Copy link
Contributor

hanna-kruppe commented Dec 23, 2024

If in-place initialization of enums is desirable then declaring usage involving uninhabited types to be UB seems too costly to me. It would greatly complicate most uses of in-place enum construction and make it much harder to do soundly in the first place, for dubious benefit.

The problematic cases are in generic code (e.g., construction of Option<T> in terms of T: InPlaceConstruct). Ordinary testing with miri usually won’t cover uninhabited types unless someone is already specifically aware of this obscure interaction. I’ve been personally involved in discussions about this in the past and explained it to people a dozen times over the years, but I still wouldn’t bet on it coming to mind if I was writing such code. Unsafe Rust shouldn’t be that hard to get right!

Once discovered, the only proper fix is to add a special case code path to every function that constructs an enum in-place, detecting annd avoiding the out-of-bounds writes. I’m aware of two options and they both suck:

  1. You could panic immediately before trying to construct any of its fields, or at least any field with offset + field size > enum size. But this discards potential side effects from constructing the inhabited fields, which ranges from theoretically annoying to actual behavioral differences (eg, if the side effect is “receiving from a MPMC channel or TCP connection”).
  2. If you don’t want to discard side effects from the earlier fields, you have to create temporary space for them and redirect their construction to that space. For large types where in-place construction is most useful, this can easily overflow the stack. Congrats, you still didn’t do the side effects but made the crash much less debuggable.

Not only do both options kinda suck when they’re triggered, but they also add considerable extra complexity which should never be exercised by a correct program (one which does not panic or get stuck in an unproductive infinite loop). It’s a pure gotcha that makes already subtle unsafe code more verbose and thus harder to audit.

@RalfJung
Copy link
Member

If in-place initialization of enums is desirable then declaring usage involving uninhabited types to be UB seems too costly to me. It would greatly complicate most uses of in-place enum construction and make it much harder to do soundly in the first place, for dubious benefit.

Agreed, that's what I meant to say but failed to make explicit. :)

@steffahn
Copy link
Member

This is the enum version of the fact that (i32, !) has size 4.

IMO theres a huge difference between the two. For structs, normal users don’t really care, because the struct itself will be uninhabited, too, anyway. Thus it’s okay to just make the unsafe code easier, make addr_of_mut work on uninitialized data, etc… because there’s almost no real cost to it. On the other hand, for enums, the resulting types are usually inhabited.

For union, that would be the case, too, but that ship has sailed; and it’s much closer to unsafe-code users anyway (and also this relates back to tuples and structs again) - relevant code examples include this

union Foo {
    unit: (),
    other: (u8, Infallible),
}

fn main() {
    let mut x = Foo { unit: () };
    x.other.0 = 42;
}

which doesn’t even require unsafe (since Rust 1.37). Furthermore, for tuples, unions, structs, existing offset_of library macros were already able to handle them anyway.

For enums, when the resulting type is inhabited, there is an actual cost to the increased size and alignment, and the vast majority of users – especially all existing users – have zero benefits from this reserved extra space in – say – a Result<(), ConversionFailed<Infallible>> (I’m imagining generic intermediate library where ConversionFailed can wrap an error from a TryFrom impl, adding extra information.)

offset_of for enums isn’t possible yet even with library workarounds, as far as I know. Only repr(C) or repr(u…) (or repr(C, u…)) enums can already be handled by manual offset-calculations, right? Unless I’m missing something, the only enum support of offset_of that doesn’t have these significant consequences in terms of new capabilities for user code, is if its limited to such reprs. Limiting it like this is even a possibility that the RFC’s future possibilities explicitly called out. Where is the discussion and decision not to do so?

Ordinary testing with miri usually won’t cover uninhabited types unless someone is already specifically aware of this obscure interaction. I’ve been personally involved in discussions about this in the past and explained it to people a dozen times over the years, but I still wouldn’t bet on it coming to mind if I was writing such code. Unsafe Rust shouldn’t be that hard to get right!

I believe this is completely solvable. For example: offset_of can simply fail to compile in all problematic cases. It would only support enums with repr(C) (etc…), or enums where the concrete type is known to have the variant in question inhabited. In a generic context, with T not known to be inhabited:1 hard compile error. Offer an additional macro try_offset_of which returns Option<usize>. If we simply never give a user an offset at all for an uninhabited enum variant, they can’t possibly screw up. The documentation of try_offset_of would clearly explain the case it can fail, and any user ignoring this would at worst be following the “panic early” approach with a visible unwrap in their code.

I also believe that offset_of is the wrong abstraction for in-place initialization of enums. For in-place initialization of structs you don’t use offset_of; you use add_of_mut instead. While writing my simple example of

.byte_add(offset_of!(Option<(u8, Infallible)>, Some.0.0)).cast()

above I was already quite dissatisfied because I had no way of connecting the type of the (nested) field with the result type of the cast. I could typo that… this code also will not start to fail compiling if I change the enum… etc.

But add_of_mut isn’t great for in-place initialization either. Realistically we should have an alternative for add_of[_mut based on patterns. In particular: If you use add_of_mut for structs, you won’t get notified by the compiler if the struct gains additional fields. A pattern-based approach could probably solve it all: even avoid pitfalls like the issue of implicit Deref; offer a way to ensure all fields have been mentioned, support enums without the need for byte_add and manual cast, and in its try_… variant presumably also offer a way to be able to handle the failure case only once instead of having to deal with multiple Option<usize> from the try_offset_of I have explained above. [Those could be annoying because you get multiple Option values while you know that they really should all be None or all be Some simultaneously.]


Also, coming back to the process, all of this discussion seems quite out of place in a tracking issue.

Footnotes

  1. I suppose this has to be [at least initially] very restrictive actually, because it isn’t semver-breaking for a type that was never constructible through its API anyway to become uninhabited in a compiler-understood way.

@hanna-kruppe
Copy link
Contributor

hanna-kruppe commented Dec 23, 2024

Better ways to express manual in-place construction would be an interesting addition to the language. However, it's definitely out of scope for this tracking issue, and it doesn't really help use cases such as postcard-forth that have good reasons to work in terms of offsets.

Regarding try_offset_of - it's an interesting idea, though it still complicates unsafe code with additional early exits / panic locations. Besides the semver issue w.r.t. library types you mentioned, there's also the potential problem of the compiler getting smarter or less smart about which types it considers uninhabited for layout purposes. This is probably subtly different from e.g. whether the proposed never patterns are allowed. The exact conditions for this layout optimization may have to be cast in stone at the time try_offset_of is stabilized and never changed again. Especially if try_offset_of can be const-eval'd (it really ought to), because then any change can break the build instead of just making some erroneous code path panic a bit later or earlier.

More generally, the proposed enum layout optimization would complicate far more than just manual in-place construction. As already discussed way back in #49298 (comment), there's useful compiler optimizations (on polymorphic MIR) that would want to do essentially the same thing, and these aren't allowed to introduce an early panic to prevent the out-of-bounds writes. It would also be annoying if Result<T, !> no longer had the same layout as T for all T. For example, consider a specialization of collecting vec::IntoIter<Result<T, !>> into Result<Vec<T>, !> by type-punning the allocation into Vec<T>, moving remaining elements to the start of the buffer if some were already consumed, and returning Ok(...). I believe this is correct if Result<T, !> has the same size and alignment as T. If T being uninhabited makes Result<T, !> a 1-ZST, then it's incorrect because the dangling pointer for the (necessarily empty) vector would be unaligned for T. It's possible to ensure align_of::<Result<T, !>> == align_of::<T>, but this would sometimes increase the size of inhabited enums containing Result<T, !>, so there will be requests to not do it. It's also possible to fix by detecting this case in the specialized impl and return Vec<T>::new() in this case. But I hope this illustrates my point that it's not only in-place construction that gets more complicated.

@steffahn
Copy link
Member

The exact conditions for this layout optimization may have to be cast in stone at the time try_offset_of is stabilized and never changed again. Especially if try_offset_of can be const-eval'd (it really ought to), because then any change can break the build instead of just making some erroneous code path panic a bit later or earlier.

I think this is simply false, in the same way that is false that mem::size_of or mem::align_of, both const-evaluatable, would set in stone Rust’s layout algorithm. We’d just document it properly, and like with mem::size_of/mem::align_of, if someone uses them to break a build when a layout changes, in the same manner, if someone uses try_offset_of in a way that breaks the build if a type becomes uninhabited, that’s just improper usage of try_offset_of.

More generally, the proposed enum layout optimization would complicate far more than just manual in-place construction.

I agree that it can complicate things… I’m just not convinced that any of it would necessarily become too complicated.

@hanna-kruppe
Copy link
Contributor

With size_of and align_of, it’s typically possible and even easy to do whatever you want to do without caring what the exact value is (transmuted can break, but it’s hard to do something with transmutes that relies only on the size and isn’t also broken because other aspects of layout are undefined). Similarly, offset_of values changing is typically not breaking in practice: you’re specifically using it to not hard-code a specific offset!

But a fallible try_offset_of feels different: to be immunized against future changes from Some to None, you’d essentially have to carry the Option<usize> through all compile-time work and only unwrap the result at run time. For something like postcard-forth, which stores a whole bunch of field offsets in static data to use in runtime Rust code, that’s technically possible but not very natural and hard to stomach because it makes the static data 50% larger (three usize instead of two). There could also some cases that really need the offset to be a plain usize at compile time, e.g., as const operand for Inline assembly to use as immediate offset for load/store instructions.

@steffahn
Copy link
Member

steffahn commented Dec 23, 2024

There could also some cases that really need the offset to be a plain usize at compile time, e.g., as const operand for Inline assembly to use as immediate offset for load/store instructions.

Hopefully you only ever call this inline assembly if by then you know that the variant is inhabited. You can do .unwrap_or(isize::MAX as usize) or something like that at compile-time, at the place the value is inserted into the assembly, and hope that as an additional line of defense, the resulting inline assembly crashes your program if you mess up and call it when you weren’t supposed to.


to be immunized against future changes from Some to None, you’d essentially have to carry the Option<usize> through all compile-time work and only unwrap the result at run time. For something like postcard-forth, which stores a whole bunch of field offsets in static data to use in runtime Rust code, that’s technically possible but not very natural and hard to stomach because it makes the static data 50% larger (three usize instead of two).

Maybe Option<usize> isn’t ideal… too easy to unwrap at compile time? Size of 2*usize? If we can just not support offset values of usize::MAX, then the size problem can be solved easily I guess… perhaps some more opaque FallibleOffset type which carries a failure case through arithmetic like floats do with NaN? If we want to be maximally defensive agains improper usage, converting to Option<usize> could be unsupported at compile-time entirely, and even transmuting to usize could be prevented (like transmuting *const () to usize doesn’t work in const-eval).

In such a system, there’s an even better answer to

There could also some cases that really need the offset to be a plain usize at compile time, e.g., as const operand for Inline assembly to use as immediate offset for load/store instructions.

Const evaluation already has certain mechanisms for delayed panics, powering cases like e.g.

fn main() {
    foo(0);
}

/// panics if index > 0
fn foo(index: usize) {
    if index > 0 {
        let s: &'static String = &([][index]);
        println!("{s}");
    }
}

so a similar mechanism could power a use-case of inserting a FallibleOffset into inline assembly ~ though perhaps to avoid concerns of panic safety, it’d be inserting an abort instead of panic at the place of the inline assembly, if the fallible offset evaluates to None.

@jamesmunns
Copy link
Member

jamesmunns commented Dec 24, 2024

I'm not sure I understand all of the offset_of discussion above so I won't comment on those parts, but at least for manual enum construction, as discussed in rust-lang/rfcs#3727, we could perhaps make the use of discriminant_of! on uninhabited variants a compile-time error? (all the following logic is based on what is proposed in RFC 3727)

If the only sound way to construct the enum "from scratch" requires Discriminant<T> to give to set_discriminant, and the only ways to create that is:

  • get_discriminant, which requires an instance of the uninhabited type, which can't safely happen at runtime
  • discriminant_of!, which could refuse to compile with uninhabited types (the same behaviorally as if you asked for the Err variant of an Option<T>)

In the case of postcard-forth, if you tried to derive whatever trait that creates the metadata that stores the discriminant, it'd fail to compile, which is a good thing IMO? You could still manually produce a variant of the enum that isn't uninhabited, you just couldn't procedurally calculate "all discriminants for all variants", which seems like a palatable limitation.

@programmerjake
Copy link
Member

imo any kind of workaround by making discriminant_of or offset_of refuse to compile or something just makes it more difficult to use, since you can't use it with generic enums (since you can pass an uninhabited type in as the generic parameter), and proc macros (or really any macro) can't generate things like a construct-in-place trait or a reflection trait since you can have things like:

trait MyReflect {
    // this should reflect all variants, since sometimes you are
    // using the variants for things other than constructing enum
    // values, e.g. generating a JSON schema for serde.
    const DISCRIMINANTS: &'static [Discriminant<Self>];
}
#[derive(MyReflect, MyConstructInPlace)]
pub enum E {
    // the proc-macros *can't see* the definition of SomeType,
    // so have no way of knowing if it's uninhabited, so they can't
    // generate code that does `discriminant_of` or `offset_of` since
    // those would be compile-time errors.
    A(SomeType),
    B,
}

@jamesmunns
Copy link
Member

@programmerjake but that's kind of my point - it shouldn't be able to generate that. If a variant shouldn't ever exist, we can't receive it on the wire, which means we don't need the schema anyway. You wouldn't want to be able to construct the uninhabited variant through the reflection system either.

I think derives like MyReflect would be able to have attrs like #[schema(skip)] or so to address this, if you need to represent "impossible" variants.

@programmerjake
Copy link
Member

@programmerjake but that's kind of my point - it shouldn't be able to generate that. If a variant shouldn't ever exist, we can't receive it on the wire, which means we don't need the schema anyway. You wouldn't want to be able to construct the uninhabited variant through the reflection system either.

well, then what's it supposed to do for things like where there's no way to tell at macro expansion time or even at compilation time pre-monomorphization from the crate defining E2 if A or B or both or neither are uninhabited:

pub enum E2<T: Trait> {
    A(T),
    B(<T as Trait>::Type),
}

imo this severely restricts what proc-macros and general generic code can do for not much gain.

@programmerjake
Copy link
Member

i think it would be preferable to have enums not make the delete-uninhabited-variants'-fields optimization and, for when people want uninhabited types to not take any space, instead have a built-in type ShrinkIfUninhabited<T> kinda like ManuallyDrop<T> or PhantomData<T> that is a align-1 size-0 type iff T is uninhabited otherwise is just a wrapper containing T. so, i think it would probably be some kind of compiler magic, but if you had to express it in Rust it would be like:
https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=f9599c80c961e883166ab077c75b43e6

#![feature(specialization)]

/// implemented by the compiler for all uninhabited types
trait Uninhabited {}

trait ShrinkIfUninhabitedSpec {
    type Type: ?Sized + BorrowMut<Self>;
    const fn new(self) -> Self::Type
    where
        Self: Sized,
        Self::Type: Sized;
    fn into_inner(v: Self::Type) -> Self
    where
        Self: Sized,
        Self::Type: Sized;
}

default impl<T: ?Sized> ShrinkIfUninhabitedSpec for T {
    type Type = T;
    fn new(self) -> T
    where
        Self: Sized,
    {
        self
    }
    fn into_inner(v: Self::Type) -> Self
    where
        Self: Sized,
    {
        v
    }
}

impl<T: ?Sized + Uninhabited> ShrinkIfUninhabitedSpec for T {
    type Type = !;
    fn new(self) -> T
    where
        Self: Sized,
    {
        unreachable!()
    }
    fn into_inner(v: Self::Type) -> Self
    where
        Self: Sized,
    {
        unreachable!()
    }
}

#[derive(Copy, Clone, Ord, PartialOrd, Eq, PartialEq, Hash, Debug, Default)]
#[repr(transparent)]
pub struct ShrinkIfUninhabited<T: ?Sized>(<T as ShrinkIfUninhabitedSpec>::Type);

// accessor/constructor/into_inner/etc. impls

@hanna-kruppe
Copy link
Contributor

hanna-kruppe commented Dec 24, 2024

but that's kind of my point - it shouldn't be able to generate that. If a variant shouldn't ever exist, we can't receive it on the wire, which means we don't need the schema anyway. You wouldn't want to be able to construct the uninhabited variant through the reflection system either.

As @programmerjake said, the problem is that any type parameter can be uninhabited. If discriminant_of!() is supposed to reject uninhabited variants, then this affects even Option<T> because Some is uninhabited if T is.

One “solution” is to say the compiler rejects discriminant_of!(Option<T>, Some) in generic code where T is a type parameter because it could be uninhabited. But that makes it impossible to provide e.g., impl<T: Deserialize> Deserialize for Option<T> using in-place construction. Alternatively, it could be a post-monomorphization error (only error if the concrete type eventually substituted for T is uninhabited), but Rust generally tries hard to minimize those.

As a third alternative, there could a marker trait for “T is inhabited” and generic code can only do discriminant_of!(Option<T>, Some) is this bound holds. This has the usual downsides of marker traits: vitality of bounds, semver hazard if it’s automatically implemented by the compiler, annotation burden for library authors if it’s opt-in. So I don’t think the payoff in enum layout optimization is worth the trouble of such a marker trait.

@jamesmunns
Copy link
Member

Alternatively, it could be a post-monomorphization error (only error if the concrete type eventually substituted for T is uninhabited), but Rust generally tries hard to minimize those.

Ah, thank you @hanna-kruppe, that's indeed what I was suggesting, and I wasn't aware of the desire to avoid doing that.

@GKFX
Copy link
Contributor Author

GKFX commented Dec 24, 2024

the question is whether one can assume that offset_of! on a field type T will always give an offset O such that O + size_of::<T>() does not exceed the size of the surrounding type

This seems like a very reasonable assumption - it's an assumption I would make, and I don't see how the definition given of offset_of! ("Expands to the offset in bytes of a field from the beginning of the given type.") could be interpreted differently - if the field exists at the given offset within the given type, there is surely enough space for it.

(And furthermore, the ranges covered in this way by different fields of the same variant are disjoint.) I am not sure if we even guarantee that for struct types... though we probably should as unsafe code is likely to already rely on this for field-by-field in-place initialization of a struct.

This is guaranteed for the Rust representation https://doc.rust-lang.org/reference/type-layout.html ("The fields do not overlap. ... The second guarantee means that the fields can be ordered such that the offset plus the size of any field is less than or equal to the offset of the next field in the ordering.")1 and follows from the definitions of repr(C) (each field follows the next), primitive representation (based on repr(C)) and transparent (there is only one non-1-ZST field, so nothing to overlap).

Footnotes

  1. I think the reference as written currently guarantees that the fields of unions and enums are disjoint too! The scope of the quoted section should be narrowed to refer to one enum/union variant.

@GKFX
Copy link
Contributor Author

GKFX commented Dec 24, 2024

@steffahn On procedure - the original offset_of tracking issue (#106655) was tagged T-libs-api, and https://rust-lang.github.io/rfcs/libs_changes.html describes generalizing an unstable API as "Submit a PR", which I did (offset_of was unstable when enum support was added). Nested field support was stabilized despite only having appeared in the "Future possibilities" too. I'm not an expert on Rust procedures so this may well have been incorrect.

It looks like rust-lang/unsafe-code-guidelines#443 will need to be decided one way or another before any further process can occur here, since enum support in offset_of as currently implemented and documented assumes all offsets can be correctly provided. If they can, no change required here. If they can't, then try_offset_of is needed which would presumably want a new RFC, etc.

@steffahn
Copy link
Member

It looks like rust-lang/unsafe-code-guidelines#443 will need to be decided one way or another before any further process can occur here

I believe the layout guarantees of repr(C), primitive-representation, and combined repr(C)+primitive-representation enums are stably guaranteed (in terms of struct and union), so for those, offset_of should be unproblematic, so in principle only support repr(Rust) enums depends on that question.

On procedure - the original offset_of tracking issue (#106655) was tagged T-libs-api, and https://rust-lang.github.io/rfcs/libs_changes.html describes generalizing an unstable API as "Submit a PR", which I did (offset_of was unstable when enum support was added). Nested field support was stabilized despite only having appeared in the "Future possibilities" too. I'm not an expert on Rust procedures so this may well have been incorrect.

@GKFX To clarify, I asked mainly out of curiosity, not to criticize anybody in any way. I’m also not an expert on Rust procedures. Thank you for your efforts to help get a more capable offset_of stabilized!

I’d argue that probably this stabilization with full enum support (including repr(Rust)) isn’t really a typical case of “Generalization of a stable API”, because it adds new capabilities to the language Rust itself, though that isn’t necessarily obvious just from looking at it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. F-offset_of_enum `#![feature(offset_of_enum)]` I-lang-nominated Nominated for discussion during a lang team meeting. proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. T-lang Relevant to the language team, which will review and decide on the PR/issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests