-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue 4598 - Copy attribute syntax hashtables to each worker thread #5693
base: main
Are you sure you want to change the base?
Conversation
Still reviewing but I have a general comment:
|
This is technically still a WIP. Need to do some performance testing with it. I just wanted to get the working code into a PR :-)
Sounds good, I'll work on that next... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Except a few minor points, its looks fine.
I hope we will see the impact in performance test.
P.S: Avoid doing massive code cleanup in a PR impacting sensitive code:
It really makes the review much harder:
For example I am not absolutely sure about what was really changed in schema.c
because the cleanup just hide the functional changes ! -:(
( I think it is all around the DSE_SCHEMA_USE_GLOBAL flags but I am not absolutely sure.)
I know I know, the cleanup started off simple and small, and then it turned into a can of worms. But yes in schema.c the main change was to add a flag so we always update/access the global hash tables and not the local thread hash tables. |
a3ad9c7
to
f06350c
Compare
All changes made. I hope to get performance testing done next week. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Wouldn't contention on an rwlock imply that there are attempts to frequently write to these values? Rwlocks with .read() are effectively free since it becomes an atomic ref count. How did you determine there was contention here? |
Thierry did this investigation in the parent Issue: Using modrate, and removing these RWlock gave a 5% performance increase. I'm not sure what tool he used to measure the contention but that report is also in the parent issue. |
I do not think that we are speaking of the usual lock contention here (as the lock is never held in write mode) |
Memory cache contention doesn't make sense though, because the pages will be in the shared state in the various cpu caches. By splitting this we'll be putting more pressure on the cpu caches because we are no longer sharing the pages. So I'm still really curious about this and the assumption about the cause that is occuring. |
As I told the you the perf c2c tool reports these as a the top cache contention so there are contention even if we do not understand why it happens ! -;) |
I would certainly trust perf here, but I think understanding why something happens is going to give a better solution than just splitting up a structure and copying it (which might reduce the contention but it's going to cause data-duplication in the cache meaning we get worse cache sharing and overall worse performance). So understanding the problem seems like a better first step :) |
Initial Performance ResultsThis was all done on a single system (client and server) with 16 cores Import1-2% performance increase (only tested bdb) Modratemodrate --hostname localhost --port 389 --bindDN 'cn=dm' --bindPassword XXXXXXX --entryDN "uid=user[1-10000],ou=People,dc=example,dc=com" --attribute givenname --attribute mail --valueLength 12 --numThreads 5 No improvement Searchratesearchrate --hostname localhost --port 389 --bindDN 'cn=dm' --bindPassword XXXXXX --baseDN dc=example,dc=com --scope sub --filter FILTER --attribute givenName --attribute sn --attribute mail --numThreads 10 Single Filter: (uid=[1-10000])19% improvement Compound Filter: (&(uid=[1-10000])(objectclass=posixaccount))15% improvement ConclusionThe original issue said "modrate" got an improvement, but I don't see that. However the search improvement is significant! |
Also... For the import test I also tried adding hashtable copies to the bdb_import_foreman(), bdb_import_producer(), and bdb_public_bdb_import_main() threads, but it did not seem to help import performance (with bdb). |
Well I was playing around with the test cases and when I reran the same tests I ran above the performance improvement was very small. Sigh. I ran those tests multiple times with each version and the results were very consistent (15-19% improvement), now they are not. I think I need to wait until we get those dedicated lab machines to run these tests correctly (separating client from server). |
OK testing on large lab machines shows no noticeable improvement with this fix :-( That doesn't mean this should be closed, as we are most likely hitting another bottle neck, so this fix "could" improve things down the road when other issues are addressed |
Description: There is a lot of contention around the rwlocks for the attribute syntax hashtables. Since syntaxes rarely change we can just keep a copy in each thread and avoid locking. Then the worker can check for changes on each loop and rebuild the hashtables as needed. Did some code cleanup in schema.c relates: 389ds#4598 Reviewed by: progier(Thanks!)
f06350c
to
d628b50
Compare
Description:
There is a lot of contention around the rwlocks for the attribute syntax hashtables. Since syntaxes rarely change we can just keep a copy in each thread and avoid locking.
Then the worker can check for changes on each loop and rebuild the hashtables as needed.
Did some code cleanup in schema.c
relates: #4598