Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lang: Add QuickCheck<T> that checks owner (and other invariants) quickly, avoiding complete deserialization and consuming memory #2670

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

snawaz
Copy link
Contributor

@snawaz snawaz commented Oct 14, 2023

for best experience, review commit by commit.

Problem

If a program requires few accounts to be used in CPI calls (or any scenario that uses .to_account_info() only on the accounts), then the developers have 3 choices:

choice1: Account<'info, Mint>,

 ///CHECK: REQUIRED
choice2: UncheckedAccount<'info>,

 ///CHECK: REQUIRED
choice3: AccountInfo<'info>,
  • choice1 is safe but it's overkill if program never needs to the Mint object itself. It could be slow, as it completely deserializes the account. The biggest problem is, it consumes stack (or heap) memory unnecessarily!
  • choice2 and choice3 are fast (no deserialization at all) but unsafe — and requires CHECK: doc comment as well.

All choices are suboptimal — choice1, despite being safe, is the worst, as it's memory hungry!

Solution

QuickCheck<T> solves this satisfactorily (at least to me and we're using this solution in production).

safe_and_fast_both: Account<'info, QuickCheck<Mint>>,
  • It's both safe and fast, as it checks the owner of the account (and if there is any discriminator, that will be automatically checked by the Anchor framework already).
  • It does not need CHECK: doc comment either.

More

In some cases, we do not have any#[account] types (such as Mint type above), instead we have AccountInfo only but we still want to (rather must) check owner of such account_infos. To support this, this PR also implements trait Owner for the program type that enables us to write this:

#[program]
mod candy_machine { //  generates CandyMachine type to represent thos program
    // .. 
}

// instead of
// some_account: AccountInfo<'info>,
// now we can write this: 
some_account: Account<'info, QuickCheck<CandyMachine>>,  

In this code, CandyMachine is not account though — it's a program type. But we can use it to ensure that some_account is owned by CandyMachine.

Tests

I've added tests to cover both scenarios discussed above.

Docs

If this PR looks good to reviewers (and eventually acceptable), I'll also add the docs. So please let know.

@vercel
Copy link

vercel bot commented Oct 14, 2023

@snawaz is attempting to deploy a commit to the coral-xyz Team on Vercel.

A member of the Team first needs to authorize it.

@snawaz
Copy link
Contributor Author

snawaz commented Oct 14, 2023

This PR solves the problem in better way than a similar PR (#1475 created a long back) tried to solve (which does not compile, as noted by @armaniferrante ).

Copy link
Collaborator

@acheroncrypto acheroncrypto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comprehensive explanation.

safe_and_fast_both: Account<'info, QuickCheck<Mint>>,

How does this compare to the existing owner constraint?

#[account(owner = Mint::owner())]
my_account: UncheckedAccount<'info>,

@Aursen
Copy link
Contributor

Aursen commented Oct 14, 2023

Why do you not use a new account type for such cases? It would improve the readability of the code. Is this intentional?

@snawaz
Copy link
Contributor Author

snawaz commented Oct 14, 2023

@acheroncrypto

How does this compare to the existing owner constraint?

#[account(owner = Mint::owner())]
my_account: UncheckedAccount<'info>,

QuickCheck<T> has 4 (small) advantages:

  • You do not need to write CHECK: doc comment. With UncheckedAccount<>, you still need to write CHECK: doc comment. The above code wont compile!
  • From readability point of view, despite putting CHECK: and the constraints as you used, it's often unclear as to what else needs to be checked and what else is missed, as you pointed out here. With QuickCheck<>, we can be assured that except the deserialization, all checks that apply to Account<'info, Mint> also apply to Account<'info, QuickCheck<Mint>>, e.g do we need to check discriminator or not? The framework can automate this, without needing the users to think about the checks again and again.
  • The mere presence of UncheckedAccount in the derive(accounts) structs feels uncomfortable and unsafe (which is precisely why CHECK doc comments are enforced strictly).
    • analogy: It feels like unsafe { .. } block, you get red alerts immediately, whereas QuickCheck<> feels like normal code without unsafe { .. } blocks. All members of derive(accounts) look similar and similarly safe/checked!
  • UncheckAccount is less readable as we do not see the type (i.e Mint) it is meant to represent. I understand that the type is being used in the constraints just above it, but you need to kinda scan other parts of the code to see it. Also, if someone writes the constraint as #[account(owner = spl_token::ID)] which is the same check, but now the type Mint has disappeared from the scene and even scanning other parts wont help.

In terms of speed and quickness, it has zero advantage.

@snawaz
Copy link
Contributor Author

snawaz commented Oct 14, 2023

@Aursen

Why do you not use a new account type for such cases? It would improve the readability of the code. Is this intentional?

How would you write an equivalent of Account<'info, QuickCheck<Mint>> using new account type? (Mint is anchor_spl::token::Mint).

How about Account<'info, QuickCheck<TokenAccount>> ?

If I use new account (wrapper?) type approach more than a couple of times, I'd quickly start feeling the verbosity without much readability improvement. QuickCheck is already readable, as you can see the real account type, meant to be used if you needed full deserialized account.

@Aursen
Copy link
Contributor

Aursen commented Oct 15, 2023

@Aursen

Why do you not use a new account type for such cases? It would improve the readability of the code. Is this intentional?

How would you write an equivalent of Account<'info, QuickCheck<Mint>> using new account type? (Mint is anchor_spl::token::Mint).

How about Account<'info, QuickCheck<TokenAccount>> ?

If I use new account (wrapper?) type approach more than a couple of times, I'd quickly start feeling the verbosity without much readability improvement. QuickCheck is already readable, as you can see the real account type, meant to be used if you needed full deserialized account.

Maybe my question is not very clear. I just mean this syntax:

CpiAccount<'info, Mint>

Against

Account<'info, QuickCheck<Mint>>

@snawaz
Copy link
Contributor Author

snawaz commented Oct 15, 2023

Maybe my question is not very clear. I just mean this syntax:

CpiAccount<'info, Mint>

ah, I see. I thought about this solution as well. I've created another PR which implements Immutable<> account type (because the current Account<> type does not enforce compile-time immutability) and my experience is that adding a new account type requires more coding and more syncing (with other account types). I personally do not see that as problem because the syncing issue can be solved by extracting out the common code in a macro or so and use that.

However, @acheroncrypto probably thinks otherwise (as per his feedbacks on the other PR). Let me know what your current thoughts are. I'd happily work on that, if that is more desirable! (and hoping then my Immutable<> PR is approved as well).

We could try implementing a third-solution:

#[account(quick_check)]
safe_and_fast: Account<'info, Mint>,

But my guess is that all these alternative solutions require more code and more changes to the existing codebase.

@acheroncrypto
Copy link
Collaborator

@acheroncrypto

How does this compare to the existing owner constraint?

#[account(owner = Mint::owner())]
my_account: UncheckedAccount<'info>,

QuickCheck<T> has 4 (small) advantages:

  • You do not need to write CHECK: doc comment. With UncheckedAccount<>, you still need to write CHECK: doc comment. The above code wont compile!

CHECK: comment is just a best practice recommendation by the framework - it can be skipped easily from either command arguments or workspace level setting in Anchor.toml if you don't want to document why you're not checking the account type.

  • From readability point of view, despite putting CHECK: and the constraints as you used, it's often unclear as to what else needs to be checked and what else is missed, as you pointed out here. With QuickCheck<>, we can be assured that except the deserialization, all checks that apply to Account<'info, Mint> also apply to Account<'info, QuickCheck<Mint>>, e.g do we need to check discriminator or not? The framework can automate this, without needing the users to think about the checks again and again.

Overall fair point.

  • The mere presence of UncheckedAccount in the derive(accounts) structs feels uncomfortable and unsafe (which is precisely why CHECK doc comments are enforced strictly).

    • analogy: It feels like unsafe { .. } block, you get red alerts immediately, whereas QuickCheck<> feels like normal code without unsafe { .. } blocks. All members of derive(accounts) look similar and similarly safe/checked!

Well, that's the intent because it's unsafe! While it may be true that using Account<'info, QuickCheck<Mint>> "look similar and similarly safe/checked", looks can be deceiving. The type is not validated, so it's still considered unsafe.

To illustrate this, let's take a test case you've added:

fn test_quick_check_good_case() {
let mut good_mint_data = ([0; 1024], 10959, Mint::owner());
let good_mint_info = new_account_info(&mut good_mint_data);
let good_mint = Account::<QuickCheck<Mint>>::try_from(&good_mint_info);
assert!(good_mint.is_ok());
assert_eq!(*good_mint.unwrap(), deserialized_value());
}

-        let mut good_mint_data = ([0; 1024], 10959, Mint::owner());
+        struct WrongAccount;
+        impl anchor_lang::Owner for WrongAccount {
+            fn owner() -> Pubkey {
+                spl_token::ID
+            }
+        }
+
+        let mut good_mint_data = ([0; 1024], 10959, WrongAccount::owner());

Make these changes and the test would still pass even though we are using WrongAccount.

  • UncheckAccount is less readable as we do not see the type (i.e Mint) it is meant to represent.
    Similar to the last point, Account<'info, QuickCheck<Mint>> looks like it represent Mint but one could supply a TokenAccount and it would still work - so what's the type here? This is saying it's type Mint but not actually checking if it's type Mint. I'd say it's very hard to tell what Account<'info, QuickCheck<Mint>> does meanwhile #[account(owner = spl_token::ID)] is very explicit in what it does. I think we have quite different definitions for what's readable, for me
#[account(owner = spl_token::ID)]
my_account: UncheckedAccount<'info>,

is much more readable and understandable compared to

my_account: Account<'info, QuickCheck<Mint>>,

I understand that the type is being used in the constraints just above it, but you need to kinda scan other parts of the code to see it.

Given the idiomatic way of doing account validation in Anchor is with #[account] macro, I'd argue looking at inside the #[account] macro is not "scanning other parts of the code" because that's the default location one is supposed to look to see the account constraints.

Also, if someone writes the constraint as #[account(owner = spl_token::ID)] which is the same check, but now the type Mint has disappeared from the scene and even scanning other parts wont help.

The type Mint in Account<'info, QuickCheck<Mint> was never there in the first place as it's not being checked.


At the end of the day, this all comes down to personal preference. Anchor, being an opinionated framework, has idiomatic way of doing things and allowing multiple ways of doing the same thing usually results in people getting confused rather than the intended effect of flexibility.

@snawaz
Copy link
Contributor Author

snawaz commented Oct 18, 2023

Well, that's the intent because it's unsafe! While it may be true that using Account<'info, QuickCheck<Mint>> "look similar and similarly safe/checked", looks can be deceiving. The type is not validated, so it's still considered unsafe.

To illustrate this, let's take a test case you've added:

fn test_quick_check_good_case() {
let mut good_mint_data = ([0; 1024], 10959, Mint::owner());
let good_mint_info = new_account_info(&mut good_mint_data);
let good_mint = Account::<QuickCheck<Mint>>::try_from(&good_mint_info);
assert!(good_mint.is_ok());
assert_eq!(*good_mint.unwrap(), deserialized_value());
}

-        let mut good_mint_data = ([0; 1024], 10959, Mint::owner());
+        struct WrongAccount;
+        impl anchor_lang::Owner for WrongAccount {
+            fn owner() -> Pubkey {
+                spl_token::ID
+            }
+        }
+
+        let mut good_mint_data = ([0; 1024], 10959, WrongAccount::owner());

Make these changes and the test would still pass even though we are using WrongAccount.

Yes, the test would pass because QuickCheck<> does not check the discriminator currently — I've only implemented the basic idea, not everything. As I said earlier, Account<'info, QuickCheck<Mint>> must check EVERYTHING that Account<'info, Mint> does, except the former does not deserialize the type itself.

Now if you think deserialization ensures some safety, then that's not true at all. Deserialization merely interprets buffer as the given type and for most types you'll get some value (even if it does not make sense to you) without any failure — it might fail at later stage though when you try to use the deserialized type but that's a different topic altogether.

The following test defines 4 types, one of which is Correct type, the rest are Wrong types. If you replace these types with the generated code (by the macro), and then comment the discriminator checks for all the types, the test would pass (this gist shows the complete test)!

    #[account]
    #[derive(InitSpace, Debug)]
    pub struct Correct {
        pub simple: u32,
        #[max_len(4)]
        pub complicated: Vec<u32>,
    }

    #[account]
    #[derive(InitSpace, Debug)]
    pub struct Wrong1 {
        pub same_size_array: [u32; Correct::INIT_SPACE],
    }

    #[account]
    #[derive(InitSpace, Debug)]
    pub struct Wrong2 {
        pub smaller_size_array: [u32; Correct::INIT_SPACE / 2],
    }

    #[account]
    #[derive(InitSpace, Debug)]
    pub struct Wrong3 {
        pub signer_bump: u8,
        pub usdc_balance: u64,
        pub deposit_count: u32,
        pub withdraw_count: u32,
    }

    #[test]
    fn test_deserialization_does_not_ensure_safety() {
        let mut data = ([0; 1024], 10959, crate::ID);
        let info = new_account_info(&mut data);

        let correct = Account::<Correct>::try_from(&info);
        assert!(correct.is_ok(), "correct: {:?}", correct);

        let wrong1 = Account::<Wrong1>::try_from(&info);
        assert!(wrong1.is_ok(), "wrong1: {:?}", wrong1);

        let wrong2 = Account::<Wrong2>::try_from(&info);
        assert!(wrong2.is_ok(), "wrong2: {:?}", wrong2);

        let wrong3 = Account::<Wrong3>::try_from(&info);
        assert!(wrong3.is_ok(), "wrong3: {:?}", wrong3);
    }

You can think of this another way: if the deserialization to the type would have provided any safety-guarantee, then Anchor would not have the notion of "discriminator". It's the discriminator which ensures safety!

Also, impl anchor_lang::Owner for WrongAccount would be generated by Anchor, so we wont have wrong owner as your example shows.

@snawaz
Copy link
Contributor Author

snawaz commented Oct 18, 2023

Given the idiomatic way of doing account validation in Anchor is with #[account] macro, I'd argue looking at inside the #[account] macro is not "scanning other parts of the code" because that's the default location one is supposed to look to see the account constraints.

Yes, I thought of using #[account(validation..)] approach, as I mentioned the following in one of my comment earlier:

#[account(quick_check)]
safe_and_fast: Account<'info, Mint>,

But later I realized, it wont work, because all of the members of Account<'info, Mint> still needs to be initialized properly but one member account: T cannot be initialized without deserialization which we want to avoid. If we change the definition of Account, and keep the member account behind a pointer-like thing (Rc or Option), then maybe that'd work. But I don't (currently) think that's a better solution.

Also, if someone writes the constraint as #[account(owner = spl_token::ID)] which is the same check, but now the type Mint has disappeared from the scene and even scanning other parts wont help.

The type Mint in Account<'info, QuickCheck<Mint> was never there in the first place as it's not being checked.

In the previous comment, I explained the type Mint will be checked using its discriminator. Deserialization is not necessary — neither does it ensure any safety!

@acheroncrypto
Copy link
Collaborator

Also, if someone writes the constraint as #[account(owner = spl_token::ID)] which is the same check, but now the type Mint has disappeared from the scene and even scanning other parts wont help.

The type Mint in Account<'info, QuickCheck<Mint> was never there in the first place as it's not being checked.

In the previous comment, I explained the type Mint will be checked using its discriminator. Deserialization is not necessary — neither does it ensure any safety!

Is the Mint discriminator you've been talking about in the room with us right now?

@snawaz
Copy link
Contributor Author

snawaz commented Oct 19, 2023

Is the Mint discriminator you've been talking about in the room with us right now?

I took it as example. If you're referring to the real anchor_spl::Mint, then it's not with us, in which case Account<'info, Mint> is not any more safer than Account<'info, QuickCheck<Mint>> either, as neither will check for its discriminator!

  • Like I said earlier, Account<'info, QuickCheck<Mint>> must check EVERYTHING that Account<'info, Mint> does.

Does Account<'info, Mint> check for its discriminator? No. It does not have any discriminator to begin with. That means, the data which gets deserialized into Mint will also get deserialized into WrongAccount as well, which you defined earlier.

struct WrongAccount; 
// some impl deserialization
impl anchor_lang::Owner for WrongAccount {
      fn owner() -> Pubkey {
          spl_token::ID
      }
 }

@acheroncrypto
Copy link
Collaborator

Is the Mint discriminator you've been talking about in the room with us right now?

I took it as example. If you're referring to the real anchor_spl::Mint, then it's not with us, in which case Account<'info, Mint> is not any more safer than Account<'info, QuickCheck<Mint>> either, as neither will check for its discriminator!

This is incorrect — not checking the non-existent discriminator does not make them equally safe or unsafe. Account<'info, Mint> will fail to deserialize if the account is not a Mint account unlike Account<'info, QuickCheck<Mint>> which will NOT fail if the account is not a Mint account.

The following will ONLY accept Mint account owned by the spl-token program(safe):

#[derive(Accounts)]
pub struct MintAccountIx<'info> {
    pub mint: Account<'info, Mint>,
}

in contrast, the following(with QuickCheck) will accept ANY account owned by the spl-token program(unsafe):

#[derive(Accounts)]
pub struct QuickCheckMintAccountIx<'info> {
    pub quick_check_mint: Account<'info, QuickCheck<Mint>>,
}

Not having a discriminator does not make an account unsafe as Native programs have their own way of making sure the accounts are uniquely deserialized. We also do not blindly interpret the account data for the wrapper accounts like you might assume. As shown by the example above, inputting a token account to the MintAccountIx will fail with a deserialization error(as it should) but QuickCheckMintAccountIx will NOT fail. Therefore, your implementation of QuickCheck is unsafe.

Which part of being able to input a different account type than the one specified in the program being unsafe is unclear to you?

@snawaz
Copy link
Contributor Author

snawaz commented Oct 19, 2023

This is incorrect — not checking the non-existent discriminator does not make them equally safe or unsafe. Account<'info, Mint> will fail to deserialize if the account is not a Mint account unlike Account<'info, QuickCheck<Mint>> which will NOT fail if the account is not a Mint account.

You missed the following two points that I mentioned about the verification:

  • Account<'info, QuickCheck<Mint>> must check EVERYTHING that Account<'info, Mint> does.
    • The entire deserialization is not necessary. Only few checks need to be done. Also, the Mint object does not need to be stored (and occupy memory) if you think deserialization is the only way to verify it.
  • The current implementation of QuickCheck does not check everything yet. I have said it before that once we agree with the idea, we can implement the rest of the things.

Therefore, your implementation of QuickCheck is unsafe.

Yes, because it's not complete.

Here are my thoughts: trait AccountVerification

Currently the deserialization does two things:

  • Verify the correctness of account.
    • Anchor verifies accounts using the discriminator approach.
    • Native programs verify using specific bits/bytes and the length of the data. The rest of the buffers do not contribute to any verification and thus can be safely ignored!
    • User's programs could verify their selected accounts in their own way, like zero-copy accounts or accounts which do not have discriminator.
  • Interpret the buffer into the given type. This part does not contribute to any verification (by definition).

If we separate the first step into its own trait, say trait AccountVerification, that can be used by QuickCheck to verify the accounts without full-deserialization.

In some corner cases where looking for specific bytes that contribute to verification is hard, OR there are too many such bytes, OR the full-deserialization can be quick (because the account is tiny) then deserialization can be done, to verify it and discard the deserialized object immediately to free memory, as it's not necessary to keep it around.

  • That means, QuickCheck can do full-deserialization (if we do not agree on AccountVerification) and discard the deserialized object immediately (if successful), that'd still be a huge improvement as it'd save lots of heap/stack memory! But I'm looking for a better and faster solution.

All I'm looking for a way to avoid consuming stack/heap memory unnecessarily, as I mentioned in my first post it's the biggest problem that can be solved using such approach.

@snawaz
Copy link
Contributor Author

snawaz commented Oct 19, 2023

@acheroncrypto and @Aursen

I changed the implementation and now it uses full-deserialization to verify the accounts, just like Account<'info, Mint>. However, unlike Account<'info, Mint>, it's still memory efficient as it discards the deserialized object immediately, to save memory! The name QuickCheck<Mint>, however, is not good now. Maybe Check<Mint> would be good.

An alternative solution could to have Account<>-like type but that does not store the object.

struct CheckedAccount<'info, T> {
     info: AccountInfo<'info>,
    _marker: PhantomData<T>,
}

It's much more work though.

@snawaz
Copy link
Contributor Author

snawaz commented Oct 23, 2023

I wonder why an unrelated code is failing due to clippy. Looks like something has changed on the ci, maybe due to the upstream dependencies?

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants