Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support block matrices in several BLAS1 routines #226

Merged
merged 4 commits into from
Mar 28, 2017
Merged

Support block matrices in several BLAS1 routines #226

merged 4 commits into from
Mar 28, 2017

Conversation

ndryden
Copy link
Contributor

@ndryden ndryden commented Mar 24, 2017

This adds support for block matrices in the Hadamard, Dot, HilbertSchmidt, ColumnTwoNorms, and ColumnMaxNorms functions (see #224). I also added tests for Hadamard, Dot, ColumnTwoNorms, and ColumnMaxNorms.

Let me know if any additional changes are needed.

ndryden added 2 commits March 24, 2017 07:24
…idt to support block matrices, and add tests for Hadamard, Dot, ColumnTwoNorms, and ColumnMaxNorms.
Copy link
Member

@poulson poulson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the very high-quality PR! I only have a few minor comments.

T expected = 0;
for (Int i = 0; i < A.LocalHeight(); ++i) {
T val = A.GetLocal(i, j);
expected += val * val;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explicitly accumulating the squares and then square-rooting should be fine for random data but can lead to very low accuracy in extreme cases. There is the routine El::UpdateScaledSquare for accumulating the square of the two norm as the product of a scale and a scaled square that should generally be preferred (and which is used within the norm computation routines).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main goal there was to provide a simple baseline that ensures there's no major issues (e.g. block vs element distributions causing problems) while not just reimplementing the method that's being tested. If you think it would be better to switch to UpdateScaledSquare in this case, I can do that.

}
expected = mpi::AllReduce(expected, g.ColComm());
expected = Sqrt(expected);
if (Abs(got - expected) > 1e-5) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be more rigorous to use a bound of the form n * limits::Epsilon<El::Base<T>>() times a small constant (e.g., 10).

TestDot<float, ELEMENT>(m, n, g, print);
TestDot<float, BLOCK>(m, n, g, print);
TestDot<double, ELEMENT>(m, n, g, print);
TestDot<double, BLOCK>(m, n, g, print);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why exclude complex tests and more precise datatypes (e.g., El::DoubleDouble, El::QuadDouble, El::Quad, and El::BigFloat)? I would be happy to add the appropriate flag guards for the higher-precision tests, but the complex arithmetic variants also working is crucial.

TestDot<float, BLOCK>(m, n, g, print);
TestDot<double, ELEMENT>(m, n, g, print);
TestDot<double, BLOCK>(m, n, g, print);
} catch (exception& e) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Elemental has made the (perhaps dubious) choice of putting curly braces on their own line.

for (Int i = 0; i < A.LocalHeight(); ++i) {
T got = C.GetLocal(i, j);
T expected = A.GetLocal(i, j) * B.GetLocal(i, j);
if (Abs(got - expected) > 1e-6) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here about using El::limits::Epsilon<El::Base<T>>() times a small constant.

TestHadamard<float, ELEMENT>(m, n, g, print);
TestHadamard<float, BLOCK>(m, n, g, print);
TestHadamard<double, ELEMENT>(m, n, g, print);
TestHadamard<double, BLOCK>(m, n, g, print);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here about testing the complex (and high-precision) cases.

@@ -60,6 +60,9 @@ void Hadamard
LogicError("A, B, and C must share the same distribution");
if( A.ColAlign() != B.ColAlign() || A.RowAlign() != B.RowAlign() )
LogicError("A and B must be aligned");
if ( A.BlockHeight() != B.BlockHeight() ||
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you noticing this crucial portion of the extension!

…ision and in complex variants. Clean up the syntax and use limits::Epsilon for checking error.
@ndryden
Copy link
Contributor Author

ndryden commented Mar 27, 2017

My latest commit addresses your comments (except see the discussion on UpdateScaledSquare). Let me know if there's any other changes required!

} catch (exception& e) {
TestDot<Complex<double>, ELEMENT>(m, n, g, print);
TestDot<Complex<double>, BLOCK>(m, n, g, print);
#if defined(EL_HAVE_QD) && defined(EL_ENABLE_DOUBLEDOUBLE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need for the EL_ENABLE macros here (or in any test driver), as they are only used to signal to some of the include macros used for template instantiation in the library.

T val = A.GetLocal(i, j);
expected += val * val;
}
expected = mpi::AllReduce(expected, g.ColComm());
expected = Sqrt(expected);
if (Abs(got - expected) > 1e-5) {
if (Abs(got - expected) > 10 * limits::Epsilon<El::Base<T>>())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The accumulated error should be a function of the number of entries (with said function being logarithmic if a tree-based scheme is used, but linear with the current single-process implementation): would you mind multiplying the right-hand side by the number of entries?

Also, it would be better to use a relative bound and to diving the left-hand side by the maximum of the norm and 1 (so that it still behaves well for zero norms).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just want to clarify: The new check should be Abs(got - expected) / std::max(expected, 1) > m * n * 10 * limits::Epsilon<El::Base<T>>()?

@poulson
Copy link
Member

poulson commented Mar 27, 2017

yes, if that isn't too much hassle. otherwise it is easy to think of cases where the check would fail.

Copy link
Member

@poulson poulson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for incorporating the changes!

@poulson poulson merged commit 776b805 into elemental:master Mar 28, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants