-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for i8
dtype, add --raw_accumulators
flag, add --target=host_cpu
for easy local testing.
#22
Conversation
Signed-off-by: Benoit Jacob <[email protected]> r
Signed-off-by: Benoit Jacob <[email protected]> x
Signed-off-by: Benoit Jacob <[email protected]>
Sample results on CPU:
|
# The raw_accumulators arg means "test configs where the result element | ||
# type is different from what it would be in the default mode". | ||
# We can't just test for (result_element_type == accumulator_element_type), | ||
# as that would cause e.g. f32 matmuls to be omitted in the default mode. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add a warning here that tells the user they are trying to do a raw_accumulators
run where the config.operand_element_type== get_default_accumulator_element_type(config.operand_element_type)
which we don't run, so they aren't confused
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Such a warning would print every time the user passes --raw_accumulators
, right? I was thinking that since this flag is non-default, it's OK for it to have slightly suprising semantics of omitting the cases that happen to be already covered by the default mode. I cared more about keeping the default mode unsurprising (including if in the future we add f32 benchmarks) and avoiding overlap between the two modes redundantly covering the same cases (would be wasteful if running both modes one after the other).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would only print if they are running --raw_accumulators with f32 or i32 input configs. And if that is the case, might be worth to have a small print or warning to let them know they are being skipped because of this, but fine with either way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went ahead and merged this because I want to use i8 in my experiments
A few unrelated things mixed in this PR, but they are separate commits if you'd prefer me to slice it into 3 PRs.
--raw_accumulators
flag that drops the truncation of the results (default False). This leads to lower arithmetic intensity (because the result values are larger) and either higher or lower performance. This is less representative of real workloads, but is sometimes easier to reason about as a microbenchmark.i8
dtype accumulating intoi32
. For now only added to thesquare
problem set. Also addedbf16
to that set.--target
flag:"host_cpu"
for testing on CPU configured for the host. This was mostly for my own use to be able to develop these changes locally without a GPU.