-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HBASE-28513 The StochasticLoadBalancer should support discrete evaluations #6543
base: master
Are you sure you want to change the base?
Conversation
e1283f4
to
517c43b
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
517c43b
to
e4e6bb5
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
e4e6bb5
to
ea9992e
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Still cleaning this up with the help of the build logs. Will mark as a draft for now. I believe the code is working quite well though, so please feel free to review the proposal and meat of the changes I'm still deciding whether it's necessary to create a balancer candidate for the replica conditional. |
ea9992e
to
3bd96be
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
3bd96be
to
860d8b7
Compare
This is working really well in my testing, and I'm not convinced that it's necessary to add a replica distribution candidate generator. This is because, typically, each region replica has so many acceptable destinations (n-r+1, where n is the number of servers and r is the number of replicas), and so many acceptable swap candidates (any region who does not represent the same data). This is different from, say, a table isolation conditional where we really want to drain many virtually all regions from a single RegionServer, and no swaps are appropriate This is probably work for a separate PR, but I think it would be nice to support pluggable candidate generators to pair with any custom conditionals that users write |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
860d8b7
to
a89d174
Compare
This comment has been minimized.
This comment has been minimized.
ae58410
to
d1622d1
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
d1622d1
to
f643db9
Compare
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
See my design doc here
To sum it up, the current load balancer isn't great for what it's supposed to do now, and it won't support all of the things that we'd like it to do in a perfect world.
Right now: primary replica balancing squashes all other considerations. The default weight for one of the several cost functions that factor into primary replica balancing is 100,000. Meanwhile the default read request cost is 5. The result is that the load balancer, OOTB, basically doesn't care about balancing actual load. To solve this, you can either set primary replica balancing costs to zero, which is fine if you don't use read replicas, or — if you do use read replicas — maybe you can produce a magic incantation of configurations that work just right, until your needs change.
In the future: we'd like a lot more out of the balancer. System table isolation, meta table isolation, colocation of regions based on start key prefix similarity (this is a very rough idea atm, and not touched in the scope of this PR). And to support all of these features with either cost functions or RS groups would be a real burden. I think what I'm proposing here will be a much, much easier path for HBase operators.
New features
This PR introduces some new features:
These can be controlled via:
Testing
I wrote a lot of unit tests to validate the functionality here — both lightweight and some minicluster tests. Even in the most extreme cases (like, system table isolation + meta table isolation enabled on a 3 node cluster, or the number of read replicas == the number of servers) the balancer does what we'd expect.
Replica Distribution Improvements
Not only does this PR offer an alternative means of distributing replicas, but it's actually a massive improvement on the existing approach.
See the Replica Distribution testing section of my design doc. Cost functions never successfully balance 3 replicas across 3 servers OOTB — but balancer conditionals do so expeditiously.
To summarize the testing, we have
replicated_table
, a table with 3 region replicas. The 3 regions of a given replica share a color, and there are also 3 RegionServers in the cluster. We expect the balancer to evenly distribute one replica per server across the 3 RegionServers...Cost functions don't work:
….omitting the meaningless snapshots between 4 and 27…
At this point, I just exited the test because it was clear that our existing balancer would never achieve true replica distribution.
But balancer conditionals do work:
New Features: Table Isolation Working as Designed
See below where we ran a new unit test, TestLargerClusterBalancerConditionals, and tracked the locations of regions for 3 tables across 18 RegionServers:
All regions began on a single RegionServer, and within 4 balancer iterations we had a well balanced cluster, and isolation of key system tables. It achieved this in about 2min on my local machine, where most of that time was spent bootstrapping the mini cluster.
cc @ndimiduk @charlesconnell @ksravista @aalhour