-
Notifications
You must be signed in to change notification settings - Fork 14
ASPLOS To Do
I have two new high-level pitches in mind for ACCEPT. I think they can both fit, but maybe we need to narrow it down.
When you build a traditional compiler, you have lots of common resources at your disposal. An IR with well-designed invariants, a general alias analysis, a domination analysis, etc.
This paper answers: what does a compiler infrastructure look like for approximate optimizations in a practical compiler? We build on LLVM's tools and provide new ones for enabling this new kind of optimization. We show that these tools are sufficient to implement a wide variety of kinds of approximate optimizations.
The tools are:
- Common analyses (precise-purity, region selection)
- Static annotations
- Dynamic feedback
- Tuning heuristics
It's a deceptively enticing prospect to completely automate the approximate-programming process. Many other papers [Paraprox, loop perforation, ExpAX] are tempted to transparently and automatically break code.
In practice, total automation has multiple problems:
- Automated approaches can be too aggressive: It's hard for programmers to provide the right bounds on what should be approximated. A test suite, for example, is obviously incomplete; a quality metric can fail to account for ways that a program can be wrong; and even annotations can fall on "soft" data that's just too sensitive.
- Automated approaches can be too conservative: It can be very frustrating when a compiler fails to apply a complicated optimization because it might break something the programmer doesn't care about. There's an analogy to parallelizing compilers [and "optimization coaching" http://www.ccs.neu.edu/home/stamourv/papers/optimization-coaching.pdf and LLVM's new optimization diagnostics]: the programmer needs to understand where optimizations are applying, where they're not, and why so they can hope to improve the situation. Paraprox is a perfect example of this: it either applies or it doesn't, based on pattern-matching over code; the programmer has no recourse if an expected optimization fails to make their code faster.
In sum, programmers are rightly suspicious of fully automated changes to their programs---especially when approximation is involved!
ACCEPT takes a different approach and gets the programmer involved. We provide visibility and control via the dual-feedback-loop approach.
We will target NPU execution, on the FPGA and also possibly software (CPU or GPU), in tandem with other compiler transformation. This emphasizes the broadness of the tool's applicability.
The "toolbox" provided by ACCEPT to client optimizations currently consists of precise-purity and per-instruction precision. We will add region identification, which is used by the NPU optimization. We should also pitch this region selection as being useful for general approximate accelerators (e.g., QUORA, relaxed versions of something like C-Cores or BERET, etc.).
With region selection available, it will be easy to write another client optimization, which I'll call general perforation. This would just skip a single-entry, single-exit region of the CFG that's marked as approximate. We could make this randomized (only skip sometimes). This is easy to implement, likely to be successful, and different from loop perforation.
We will add a section to the eval describing the three undergrads' experience with using ACCEPT to optimize new applications from scratch. Ambitiously, they may even have time to modify or add optimizations that work well for their applications.
We need to get these guys taking notes so we have interesting things to discuss in this section.
Some quantitative results here would be very interesting: asking their opinions on a Likert scale or timing a task, for example.
A couple of other optimizations we've talked about in the past that would also bear the trifecta of (a) not challenging to implement, (b) likely to succeed, and (c) somewhat novel fall under the category of selecting alternatives manually provided by the programmer.
Specifically:
- The programmer provides multiple values for a parameter (e.g., convergence criterion). We tune these parameters in tandem with perforation, NPU, etc.
- The programmer provides multiple implementations of a function. We choose which to call.
- Standard-library replacement. We include a library of relaxed implementations of functions in the STL. They can be used used when their arguments are approximate. We choose whether to call the original or the relaxed version.
It would be awesome if we could run the MSP-430 benchmark on real hardware.