Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluate block based Elemental matrix distributions versus normal elemental distributions #30

Closed
bvanessen opened this issue Feb 6, 2017 · 9 comments

Comments

@bvanessen
Copy link
Collaborator

Test the performance of the new Elemental block matrix layout and compare to standard elemental based layout. Focus particularly on convolutional kernels.

@ndryden
Copy link
Collaborator

ndryden commented Feb 7, 2017

I ran some performance comparisons of the elemental and block matrix layouts doing GEMMs on Catalyst. This used 32-bit floats and Elemental's default blocksize.

1 node:
mats1.pdf

2 nodes:
mats2.pdf

4 nodes:
mats4.pdf

16 nodes:
mats16.pdf

Once we get beyond small matrices, the block distribution is better. The exact cross-over point depends on the number of nodes. It probably makes sense to switch to it for fully-connected layers.

@ndryden
Copy link
Collaborator

ndryden commented Mar 21, 2017

An issue with this: Several Elemental methods that the documentation lists as taking AbstractDistMatrix arguments in fact are only implemented for ElementalMatrix types and do not support BlockMatrix types. Ones I have found this to be the case for:

  • Hadamard
  • Dot
  • HilbertSchmidt (needed because Dot calls it)
  • ColumnTwoNorms

@bvanessen
Copy link
Collaborator Author

bvanessen commented Mar 21, 2017 via email

@ndryden
Copy link
Collaborator

ndryden commented Mar 21, 2017

Probably not too hard, especially if the functions don't actually need the Elemental layout. If that's the case, then they should just take AbstractDistMatrix types.

We also need to update our Elemental extensions (ColumnSum and ColumnMax).

@ndryden
Copy link
Collaborator

ndryden commented Mar 24, 2017

I implemented the above operations and will be making a pull request to Elemental soon.

However, Jack Poulson was skeptical about the block matrix distribution being better in El::Gemm (due to its implementation, plus the fact that even if we fixed the implementation, there would be little difference in performance). I updated my benchmark and re-ran some tests this morning, and I now find that the element distribution is ~60% faster for the largest matrices. I'm not sure what led to the performance we saw in the results above.

We may still get an improvement on convolutional kernels, I have not looked into that. Once the PR is in Elemental, we could test that end-to-end in LBANN.

@ndryden
Copy link
Collaborator

ndryden commented Mar 28, 2017

Elemental as of commit 776b805f0131f39ceeec8943f19a7803fa950d43 now supports block versions of Hadamard, Dot, ColumnTwoNorms, and ColumnMaxNorms. (And has tests to confirm that they're correct.)

@bvanessen
Copy link
Collaborator Author

bvanessen commented Mar 28, 2017 via email

@bvanessen
Copy link
Collaborator Author

I believe that we can close this issue because the Elemental library has a very efficient implementation of the Element-wise matrix operation. Nikoli, can you please document your conversation with Jack so that we can put this to rest.

@ndryden
Copy link
Collaborator

ndryden commented Jun 20, 2017

Can read this comment for full details, but basically: the GEMM in the elemental distribution still does local GEMMs using the BLAS-3 acceleration, and the only difference is a slight change in communication pattern (MPI_Allgather instead of MPI_Bcast).

The current Elemental implementation of the GEMM for block distributions actually internally converts from a block to an elemental distribution.

@ndryden ndryden closed this as completed Jun 20, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants