Skip to content
This repository has been archived by the owner on Apr 19, 2024. It is now read-only.

Feature/atomic chaining #159

Closed
wants to merge 7 commits into from
Closed

Conversation

tassm
Copy link

@tassm tassm commented Nov 11, 2022

📝 Atomic chaining of limiters!

Hi all, have had a go at implementing atomic chaining of limiters, keen for feedback or suggestions on how it could be improved.

  1. Added ATOMIC_REQUESTS as a UnionBehaviour as suggested by @thrawn01
  2. Added a flag on the RateLimitReq proto to avoid changing all the function signatures (not sure this was the best idea)
  3. Modified the algorithms to support checking limits without modifying the actual limit values
  4. Added an integration test and another missing test

💎 Closes #45 💎

@tassm tassm requested a review from Baliedge as a code owner November 11, 2022 06:25
@Baliedge Baliedge requested a review from thrawn01 November 11, 2022 14:36
Copy link
Contributor

@Baliedge Baliedge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this contribution! Looking forward to your feedback on the review.

algorithms.go Outdated
@@ -251,13 +265,22 @@ func tokenBucketNewItem(ctx context.Context, s Store, c Cache, r *RateLimitReq)
ResetTime: expire,
}

// if we are doing a check during atomic chaining then remaining number is actually the limit
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to see a separation of concerns around checking a bucket vs. updating the bucket. Because now there's two places where the bucket's counters are updated as a shared responsibility. Can we have the update logic pulled out and called by gubernator's behavior logic and not by tokenBucket/leakyBucket?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah that's a good point, I will move the logic to persist hits into a new function

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update: I've decoupled the update logic from the actual algorithms into a new function which is invoked from peer_client.go following the actual "check" using the algorithm in the case the request is not part of an atomic chain. hopefully this is the kind of separation you were thinking of. I have also extracted the cache and store loading logic into a common function.

gubernator.go Outdated
@@ -219,16 +219,43 @@ func (s *V1Instance) GetRateLimits(ctx context.Context, r *GetRateLimitsReq) (re
Responses: make([]*RateLimitResp, len(r.Requests)),
}

// get the UnionBehaviour
var atomicRequests = HasUnionBehavior(r.GetBehavior(), UnionBehavior_ATOMIC_REQUESTS)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Combine these two lines?

@@ -127,9 +129,21 @@ enum Behavior {
// least 2 instances of Gubernator.
MULTI_REGION = 16;


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Excess whitespace

@@ -47,6 +47,8 @@ service V1 {
// Must specify at least one Request
message GetRateLimitsReq {
repeated RateLimitReq requests = 1;
UnionBehavior Behavior = 2;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Excess whitespace

@tassm tassm requested review from Baliedge and removed request for thrawn01 November 13, 2022 22:44
Copy link
Contributor

@thrawn01 thrawn01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am SO SORRY this review took so long for me too look at. I didn't even realize this PR was here until I looked at the repo this morning.

This is a great change, but in order for it to actually be atomic we will need to make a few more changes. See my other comment in V1Instance.GetRateLimits()

Again, I apologize for the tardy review.

I'm thinking of something like this in V1Instance.GetRateLimits()

if HasUnionBehavior(r.GetUnionBehavior(), UnionBehavior_FORWARDED_UNION) {
	// TODO: Preform an atomic check of all the rate limits here
}

// perform a check first if the request is for union behaviour
if HasUnionBehavior(r.GetUnionBehavior(), UnionBehavior_ATOMIC_REQUESTS) {
	// TODO: Collect all the unique keys for each ratelimit and combine them to create our union unquie key
	// this will be used to identify which instance in the cluster will handle the atomic check
	key := "combination of all keys"
	// TODO: Make `GetUnionPeer()` which will retry if a peer is leaving the cluster at the time we request a lookup.
	peer, err := s.GetUnionPeer(ctx, key)
	if err != nil {
		// TODO: Handle error
	}

	// TODO: Mark the forwarded union as forwarded, so we don't preform the same check on the remote instance
	r.UnionBehavior = UnionBehavior_FORWARDED_UNION

	// TODO: Forward and wait for a response
	return &resp, nil
}

Thoughts?

@@ -47,6 +47,7 @@ service V1 {
// Must specify at least one Request
message GetRateLimitsReq {
repeated RateLimitReq requests = 1;
UnionBehavior Behavior = 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be UnionBehavior UnionBehavior = 2;

}
}
}
// if atomic check is valid then run as we normally would
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a possible race condition where we check the limits, but by the time we are ready to apply the limits some other instance has incremented the limits we are interested in.

One solution would be to have the union of rate limits all be owned by the same instance such that if we see the union behavior flag set, then we send the complete list of rate limits to a single owning instance which can then atomically check all the ratelimits in the list (by locking the entire cache).

Does that make sense?

@thrawn01 thrawn01 self-assigned this Feb 22, 2023
@thrawn01 thrawn01 added the enhancement New feature or request label Feb 22, 2023
@thrawn01
Copy link
Contributor

I'm so sorry this code change hasn't worked out. This would be a very nice feature to have, but implementing it in a race condition safe way is going to be a challenge.

The way forward is to either
A. Create a union of the rate limits such that one instance owns the union (thus all the ratelimits) and can be worked on in a batch
B. Add a new Peer call which reserves each limit on the owning instances and if any fail to reserve then all rate limits can be reverted.

I'm about to make some backward incompatible changes to the peer clients and the worker pool which WILL make implementing this feature easier, but will change so much that this PR as written would need to be re-done.

I want to keep this code around in the branch as reference, but I will close this PR until we have a good direction forward.

I hate to throw away good code, but in this instance, I think we must at least for now.

@thrawn01 thrawn01 closed this Aug 17, 2023
@tassm
Copy link
Author

tassm commented Aug 17, 2023

Hey @thrawn01, no worries. Unfortunately I've been focused elsewhere the last few months and haven't managed to revisit this. I agree it is a more complex challenge than I originally thought. Looking forward to seeing your changes and maybe I can help you revisit this feature in the future.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature Request: Atomic Chaining of Limiters
3 participants