Evaluate block based Elemental matrix distributions versus normal elemental distributions #30

bvanessen · 2017-02-06T20:41:53Z

Test the performance of the new Elemental block matrix layout and compare to standard elemental based layout. Focus particularly on convolutional kernels.

ndryden · 2017-02-07T20:36:41Z

I ran some performance comparisons of the elemental and block matrix layouts doing GEMMs on Catalyst. This used 32-bit floats and Elemental's default blocksize.

1 node:
mats1.pdf

2 nodes:
mats2.pdf

4 nodes:
mats4.pdf

16 nodes:
mats16.pdf

Once we get beyond small matrices, the block distribution is better. The exact cross-over point depends on the number of nodes. It probably makes sense to switch to it for fully-connected layers.

ndryden · 2017-03-21T13:52:14Z

An issue with this: Several Elemental methods that the documentation lists as taking AbstractDistMatrix arguments in fact are only implemented for ElementalMatrix types and do not support BlockMatrix types. Ones I have found this to be the case for:

Hadamard
Dot
HilbertSchmidt (needed because Dot calls it)
ColumnTwoNorms

bvanessen · 2017-03-21T16:08:46Z

How hard would it be to get block based implementations of these functions. Hopefully, it won’t be too bad. Brian C. Van Essen [email protected] (w) 925-422-9300 (c) 925-290-5470

…

On Mar 21, 2017, at 6:52 AM, Nikoli Dryden ***@***.***> wrote: An issue with this: Several Elemental methods that the documentation lists as taking AbstractDistMatrix arguments in fact are only implemented for ElementalMatrix types and do not support BlockMatrix types. Ones I have found this to be the case for: • Hadamard • Dot — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

ndryden · 2017-03-21T16:19:12Z

Probably not too hard, especially if the functions don't actually need the Elemental layout. If that's the case, then they should just take AbstractDistMatrix types.

We also need to update our Elemental extensions (ColumnSum and ColumnMax).

ndryden · 2017-03-24T15:58:23Z

I implemented the above operations and will be making a pull request to Elemental soon.

However, Jack Poulson was skeptical about the block matrix distribution being better in El::Gemm (due to its implementation, plus the fact that even if we fixed the implementation, there would be little difference in performance). I updated my benchmark and re-ran some tests this morning, and I now find that the element distribution is ~60% faster for the largest matrices. I'm not sure what led to the performance we saw in the results above.

We may still get an improvement on convolutional kernels, I have not looked into that. Once the PR is in Elemental, we could test that end-to-end in LBANN.

ndryden · 2017-03-28T01:01:26Z

Elemental as of commit 776b805f0131f39ceeec8943f19a7803fa950d43 now supports block versions of Hadamard, Dot, ColumnTwoNorms, and ColumnMaxNorms. (And has tests to confirm that they're correct.)

bvanessen · 2017-03-28T04:24:07Z

Great. Brian C. Van Essen [email protected] (w) 925-422-9300 (c) 925-290-5470

…

On Mar 27, 2017, at 6:01 PM, Nikoli Dryden ***@***.***> wrote: Elemental as of commit 776b805f0131f39ceeec8943f19a7803fa950d43 now supports block versions of Hadamard, Dot, ColumnTwoNorms, and ColumnMaxNorms. (And has tests to confirm that they're correct.) — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

bvanessen · 2017-06-20T23:30:06Z

I believe that we can close this issue because the Elemental library has a very efficient implementation of the Element-wise matrix operation. Nikoli, can you please document your conversation with Jack so that we can put this to rest.

ndryden · 2017-06-20T23:40:59Z

Can read this comment for full details, but basically: the GEMM in the elemental distribution still does local GEMMs using the BLAS-3 acceleration, and the only difference is a slight change in communication pattern (MPI_Allgather instead of MPI_Bcast).

The current Elemental implementation of the GEMM for block distributions actually internally converts from a block to an elemental distribution.

bvanessen added the enhancement label Feb 6, 2017

bvanessen added this to the Accelerated convolution / pooling layers milestone Feb 6, 2017

bvanessen assigned adammoody, ndryden, timmoon10 and liruipeng Feb 6, 2017

ndryden closed this as completed Jun 20, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluate block based Elemental matrix distributions versus normal elemental distributions #30

Evaluate block based Elemental matrix distributions versus normal elemental distributions #30

bvanessen commented Feb 6, 2017

ndryden commented Feb 7, 2017

ndryden commented Mar 21, 2017 •

edited

Loading

bvanessen commented Mar 21, 2017 via email

ndryden commented Mar 21, 2017

ndryden commented Mar 24, 2017

ndryden commented Mar 28, 2017

bvanessen commented Mar 28, 2017 via email

bvanessen commented Jun 20, 2017

ndryden commented Jun 20, 2017

Evaluate block based Elemental matrix distributions versus normal elemental distributions #30

Evaluate block based Elemental matrix distributions versus normal elemental distributions #30

Comments

bvanessen commented Feb 6, 2017

ndryden commented Feb 7, 2017

ndryden commented Mar 21, 2017 • edited Loading

bvanessen commented Mar 21, 2017 via email

ndryden commented Mar 21, 2017

ndryden commented Mar 24, 2017

ndryden commented Mar 28, 2017

bvanessen commented Mar 28, 2017 via email

bvanessen commented Jun 20, 2017

ndryden commented Jun 20, 2017

ndryden commented Mar 21, 2017 •

edited

Loading