-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evaluate block based Elemental matrix distributions versus normal elemental distributions #30
Comments
I ran some performance comparisons of the elemental and block matrix layouts doing GEMMs on Catalyst. This used 32-bit floats and Elemental's default blocksize. 1 node: 2 nodes: 4 nodes: 16 nodes: Once we get beyond small matrices, the block distribution is better. The exact cross-over point depends on the number of nodes. It probably makes sense to switch to it for fully-connected layers. |
An issue with this: Several Elemental methods that the documentation lists as taking
|
How hard would it be to get block based implementations of these functions. Hopefully, it won’t be too bad.
Brian C. Van Essen
[email protected]
(w) 925-422-9300
(c) 925-290-5470
… On Mar 21, 2017, at 6:52 AM, Nikoli Dryden ***@***.***> wrote:
An issue with this: Several Elemental methods that the documentation lists as taking AbstractDistMatrix arguments in fact are only implemented for ElementalMatrix types and do not support BlockMatrix types. Ones I have found this to be the case for:
• Hadamard
• Dot
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Probably not too hard, especially if the functions don't actually need the Elemental layout. If that's the case, then they should just take We also need to update our Elemental extensions ( |
I implemented the above operations and will be making a pull request to Elemental soon. However, Jack Poulson was skeptical about the block matrix distribution being better in El::Gemm (due to its implementation, plus the fact that even if we fixed the implementation, there would be little difference in performance). I updated my benchmark and re-ran some tests this morning, and I now find that the element distribution is ~60% faster for the largest matrices. I'm not sure what led to the performance we saw in the results above. We may still get an improvement on convolutional kernels, I have not looked into that. Once the PR is in Elemental, we could test that end-to-end in LBANN. |
Elemental as of commit |
Great.
Brian C. Van Essen
[email protected]
(w) 925-422-9300
(c) 925-290-5470
… On Mar 27, 2017, at 6:01 PM, Nikoli Dryden ***@***.***> wrote:
Elemental as of commit 776b805f0131f39ceeec8943f19a7803fa950d43 now supports block versions of Hadamard, Dot, ColumnTwoNorms, and ColumnMaxNorms. (And has tests to confirm that they're correct.)
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
I believe that we can close this issue because the Elemental library has a very efficient implementation of the Element-wise matrix operation. Nikoli, can you please document your conversation with Jack so that we can put this to rest. |
Can read this comment for full details, but basically: the GEMM in the elemental distribution still does local GEMMs using the BLAS-3 acceleration, and the only difference is a slight change in communication pattern ( The current Elemental implementation of the GEMM for block distributions actually internally converts from a block to an elemental distribution. |
Test the performance of the new Elemental block matrix layout and compare to standard elemental based layout. Focus particularly on convolutional kernels.
The text was updated successfully, but these errors were encountered: