-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More performance improvements with nssdb #137
base: main
Are you sure you want to change the base?
Conversation
Ok I nailed the second biggest performance issue. |
Eh, sounds like I broke something :-D |
Ok found a copy/paste bug in the reinit() function that caused the failures. |
This will allow to measure performance with some reliability. Signed-off-by: Simo Sorce <[email protected]>
The Zeroize crate performance is crazy slow. The search for a performance issue with the nssdb storage via valgrind's callgrind revelead that 50% of the time in the HMAC code was spent on zeroization because of the way the Zeroize crate is built. Replace it everywhere with a simpler zeromem function that underneath simply calls OPENSSL_cleanse() on the mutable slice passed in. With this change alone the new test that performs 100 pbkdf2 calls went from ~9s to ~4.5s on my laptop. Signed-off-by: Simo Sorce <[email protected]>
The allocations and zip/iterations were causing another huge performance hit (at least in debug mode). Switch HMAC code to use slices instead, and recode the iterations in the init code to be simpler. Signed-off-by: Simo Sorce <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just nits.
|
||
use constant_time_eq::constant_time_eq; | ||
use zeroize::Zeroize; | ||
|
||
/* max algo right now is SHA3_224 with 144 byts blocksize, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/* max algo right now is SHA3_224 with 144 byts blocksize, | |
/* max algo right now is SHA3_224 with 144 bytes blocksize, |
/* max algo right now is SHA3_224 with 144 byts blocksize, | ||
* use slightly larger for good measure (and alignment) */ | ||
const IPAD_INIT: [u8; 160] = [0x36; 160]; | ||
const OPAD_INIT: [u8; 160] = [0x5c; 160]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we make the number 160 constant instead of using it on at least 5 different places here and below?
Using valgrind's callgrind tool I am analyzing the behavior of with the nssdb storage backend which shows particularly bad performance in pkcs11-provider's CI.
The first easy pick was to remove the Zeroize crate, which single-handedly accounted for a 50% of the time spent in pbkdf2 operations.
The second easy pick was to rewrite the HAM initialization to use fixed slices instead of vectors for the internal state.
The nssdb is still much slower than the others, due to higher use of the pbkdf2 function, yet nssdb uses the same and it is much faster.
Some playing around shows the "native" version is about 20/30 % slower than the OpenSSL one we use in fips mode ... but there is definitely more work to do to improve performance here.