-
-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change ndarray-linalg backend to intel-mkl #369
Conversation
I don't have the ability to do a thorough review right now, but I still want to respond shortly to a few points. First of all thank you so much for the amount of time and effort and brainpower you have put into this PR and the related issue. Windows support is definitely important, but I can't provide it myself. I highly appreciate it! Regarding older versions of ndarray: my take on this was to support older versions of ndarray as long as they are not causing problems. I agree that it is not that important to keep supporting them, therefore feel free to drop them. It will take me another couple of weeks until I am able to review this PR in detail, but I will keep an eye on it and try to respond as well as possible. |
84ce76d
to
4c8d905
Compare
Thanks!
All good, I know you are currently traveling around, so take your time. Due to personal reasons, I won't be having that much time in the next few weeks anyway.
Well, now I implemented the workaround with different packages in a reasonably clean way I think. Of course, it's still a bit hacky and without the workaround it would be cleaner. I don't really care either way, I am completely fine with removing those workaround test packages and dropping the older versions of ndarray. The failing clippy CI job failure for The failing job for the cargo-deny for licenses could possibly indicate a bigger problem. The intel-mkl backend is licensed under the "Intel Simplified Software License", which I don't know much about. It certainly is not open-source. This shouldn't really affect the license of argmin itself, as the backend is only there under a feature for testing purposes and users of argmin are expected to depend on a backend themselves explicitly if required. However, a user could potentially enable the features that are only intended for development and thereby depend on intel-mkl through argmin. On the other hand the license of ndarray-linalg which is the immediate dependency of argmin providing the intel-mkl feature, has the standard MIT OR Apache-2.0 license. I don't know how licensing works in these circumstances. A possible alternative to be safe could be to specify ndarray-linalg as a dev-dependency for argmin and remove all features that could enable it as a normal dependency. This would have the drawback that it would always build and link against it while testing, but a user could never depend on it through argmin. |
Sorry again for the delay!
I did give it a try and ended up with the same error locally. I had a look at the test as well as the code and I moved the
I had a look at the changes and to be honest at first sight I don't fully understand what is going on. I assume you followed "Solution 1" which you described in EmbarkStudios/cargo-deny#368. Did that cause any issues with private implementation details? // The tests expect the name for the crate containing the tested functions to be argmin_math
#[cfg(test)]
use crate as argmin_math;
include!(concat!(
env!("CARGO_MANIFEST_DIR"),
"/ndarray-tests-src/eye.rs"
)); I assumed that there a no more tests in
I guess your workaround will stay relevant if there are any future versions of ndarray(-linalg) that we may want to support alongside older versions. Therefore I'd leave the decision whether to drop support for older versions or not for later.
You can ignore those. Newer versions of Rust come with new/adapted clippy lints, therefore we regularly have new lints popping up in the CI. It's even worse in the CI than locally because the CI uses the beta channel for clippy.
This is actually my biggest concern. I will have to think about this in detail. Sometimes you just wish you had a legal team! |
No problem, I hope you had a nice time away from the project for a while and now happy holidays!
That's very weird, I just tried again locally and the test still passes locally on my machine. Just to make sure we are running the same test, I use the command
Correct, testing is moved to dedicated crates to avoid the problems described in the issue. Namely that using features to select different backend versions doesn't work if those backends specify the same links attribute for different versions. The only way, short of completely ditching all but one versions of the offending backend, is using completely different crates instead of features. Afaik there were no problems related to testing private implementation details, I basically separated all tests as they were into separate files, which are then included in all test crates by macro. I guess the reason is that everything happening in argmin-math is implementation of a public trait of argmin for concrete public types. So everything was already publicly accessible anyway. If there are still questions please don't hesitate to ask.
The reason I left it in was to indicate to a reader where the tests live. It was a (rather bad) attempt at code as documentation. But it also has the benefit that rust still compiles (and checks!) the tests when changes in the implementation occur. So even if the tests in the implementation files cannot be run because of the missing backend, including them can at least make sure that syntax errors are caught at compile time.
Yeah, I guess I can't help you with that decision. Personally I would feel safe with the mentioned workaround/alternative. As long as there is no way to pull in the intel-mkl backend directly through |
Thanks! :) Belated happy holidays to you too! :)
Yes, I was running this exact test. I don't have MKL installed, therefore I assume that the
Sure, these errors can of course lead to increasing errors in the long run -- I simply don't have a feeling of which magnitude of error is to be expected. But I think without knowing what is actually going on in MKL it may be difficult to judge whether this is acceptable or not. Its interesting that we get different results. Either this is due to different MKL versions or due to differences in the processors we are using. Apparently CPUs may also reorder instructions, which could have an effect on the result.
This sounds great! I did not expect any issues with private implementation details, but if there were, it would hint to an issue (because as you said, its all public interfaces).
Ah, this also a good idea. I would appreciate it though if you could add a short comment to indicate that, because I am for sure going to forget that when I look at the code the next time ;)
Just for my understanding, when you talk about |
682519c
to
2633198
Compare
So, I just added a comment for the explanation for why the tests are still included in the argmin-math ndarray source and now the previously failing test passes in CI in my fork? Really weird.
I don't think so, intel-mkl is linked statically, I wouldn't think I have a static lib already preinstalled in wsl. But I also don't really how to check which version is used in the test besides looking at the compilation logs and comparing to my local build output. And there everything is the same.
While CPUs can reorder instructions, they don't (or at least shouldn't in the absence of bugs) do that if results could change. Afaik CPUs also don't have something like a
As mentioned I added a comment above all the test inclusions in the argmin-math ndarray source. However, I have to revise my previous statement regarding the building and compilation a bit. The way it works now is that all tests can be run directly in the argmin-math crate as long as no feature for ndarray-linalg is enabled. As soon as such a feature is enabled, an error for a missing dynamic library will occur when compiling. Since ndarray-linalg is only necessary for the
I mean both, because I had the same thought as you:
As long as the intel-mkl dependency cannot be pulled into the regular dependency graph through a feature there should be no licensing problem. Since the |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #369 +/- ##
=======================================
Coverage 90.16% 90.16%
=======================================
Files 126 141 +15
Lines 19679 19679
=======================================
Hits 17744 17744
Misses 1935 1935 ☔ View full report in Codecov by Sentry. |
Ha, fun! :D It still fails on my local machine. I updated Rust from 1.74.1 to 1.75, ran
Since MKL is apparently multi-core I assumed that the order in which operations are performed isn't necessarily fixed, but apparently that's not true. Even if that was the case, the test uses matrices/vectors so small that I'm fairly sure all operations run on a single core.
Ah yes, that makes sense.
Yes, I agree!
Another great idea, thanks! However, now |
I was just playing around with
I don't know yet what to do with this information, but I wanted to document it here just in case. Because of this dependency issue, but also because the warning
annoys me everytime I compile, would you be opposed to changing the resolver to |
Thanks yet again for the amount of effort you put into this PR, I really appreciate it! I don't have much time right now to respond in detail, but I wanted to quickly reply regarding that point:
This is already done in |
6b7e701
to
0aa857d
Compare
Thanks, I appreciate your appreciation :) I just opened an an issue for the differences in resolved dependencies between |
Changed the backend used for the ndarray-linalg dev-dependency from netlib to intel-mkl to enable running tests on windows. This does not enable running the tests of argmin-math on windows, and the backend used by argmin-math for testing remains unchanged. Added branch to ci to test out changes before merging.
Due to the additivity requirement on cargo features even for features that are not enabled simultaneously, together with the requirement that only one crate in the dependency tree specifies any `links` attribute, it was not possible to test multiple backend versions for ndarray-linalg by features: argmin-rs#368 (comment) Previously this approach worked for different versions of ndarray-linalg insofar as the used backend `netlib` just coincidentally remained the same for all tested ndarray-linalg versions. This is not the case for the intel-mkl backend, which is desirable to use as the default because it works on all three major platforms. To enable the change of the backend as well as make the testing more robust against future changes in ndarray-linalg backend versions, the tests were factored out into their own crates. This works because the strict additivity requirement for inactive features is not enforced for dependencies: rust-lang/cargo#5969 (comment)
Moved the ndarray-linalg test packages into the argmin-math directory, as they semantically only belong to that package
…hs ndarray src even though separate crates are used for testing with the ndarray-linalg backend
…h accidents Set `licenses.private.ignore = true` in deny.toml to prevent license errors in unpublished crates
0aa857d
to
d1d174e
Compare
They already fixed I also rebased again on the latest main and resolved some conflicts that popped up. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again, this looks good! There is nothing to complain, but I'm still not sure how to proceed because of cargo-deny
. Is it possible to temporarily allow the license? We just have to make sure that we don't forget to disable it once a new cargo-deny
version is out.
I can put the license on the whitelist if you want, but tracking when to take it off of the list again would be manual. It would probably be a good idea to have another PR lined up to take it off the whitelist again from the get-go so it doesn't get missed. If you want I can do that. |
This sounds like a reasonable approach! |
…mbarkStudios/krates#60 is available in the cargo-deny-action
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot!
No problem, it was fun to work on this! |
This PR addresses EmbarkStudios/cargo-deny#368 by changing the ndarray-linalg backend used for all testing purposes from netlib to intel-mkl. This enables running the tests, and thus development, on Windows, as the intel-mkl is the only backend available on all platforms. This change, however, necessitates restructuring of the tests using this backend as described in the linked issue. Another option would be to cease testing against the older versions of the ndarray-linalg dependency. The second newest release is already over 2 years old and the
ndarray
version currently tested with that version nearly 3 years old. I would guess most people wouldn't use a new version of argmin with such an old version of ndarray.I ran all tests locally in Windows and WSL Linux with everything passing. However, on CI, the test
solver::gaussnewton::gaussnewton_linesearch::tests::test_next_iter_regression
fails due to a change in the result. I currently don't know what the reason for this might be, as the test passes locally on exactly the same version of Ubuntu used in CI. Locally I have an x64 CPU, which should also be the architecture used in CI. The expected result is2.25
while the actual result in CI is2.2499999999999964
according to the logs. I don't know if the error message shows all digits. If the message actually shows all digits, this would be a difference of exactly16 * f64::EPSILON
. If you have a clue why this discrepancy between my local run and CI happens, I would be delighted to know.