-
Notifications
You must be signed in to change notification settings - Fork 305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FIRRTL][LowerClasses] Improve performance, NFC #7060
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, nice work thanks!
@@ -447,7 +602,7 @@ LogicalResult LowerClassesPass::processPaths( | |||
} | |||
|
|||
// If we are at a leaf, nothing to do. | |||
if (std::distance(it->begin(), it->end()) == 0) { | |||
if (it->begin() == it->end()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
llvm::size(A, B)
exists specifically to confidently do std::distance(..)
on ranges, just related tip.
For checking distance of zero, comparison seems good to me! 👍
b7b104d
to
915f5ac
Compare
// Fill `shouldCreateClassMemo`. | ||
for (auto moduleLike : circuit.getOps<FModuleLike>()) | ||
shouldCreateClassMemo.insert({moduleLike, false}); | ||
parallelForEach( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also check the single-threaded performance. Parallelization only improves walltime. We need to improve absolute time also.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked the single threaded performance on a large internal design, and this change leads to a >90% reduction in runtime in a single threaded context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👏 this is great, thank you for cleaning up the path stuff as you go, and for all the performance improvements. The use of instance graph instead of walking modules is great--that was one that I had top of mind.
if (isa<PortAnnoTarget>(target)) | ||
// Port annotations are batch-updated afterwards. | ||
portAnnotations.push_back(annotations.getArrayAttr()); | ||
else | ||
// Otherwise it's fine to update immediately. | ||
target.setAnnotations(annotations); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
target.setAnnotations(annotations);
could be made slightly faster with cast<OpAnnoTarget>(target)setAnnotations(annotations);
I don't love something about the way this sometimes mutates an internal structure and other times just sets the annotations. I think I would be happier if this function always appended to an annotation array, and op modifications happened outside this function, and then portAnnotations
doesn't need to be a member anymore. It would be even nicer if this whole function didn't care about the target type at all - but we need to know when we create the targetSym
. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed processPathTrackers to return FailureOr<AnnotationSet>
and mutate annotations from callers. Does that look good to you?
4a5e2e1
to
fdea037
Compare
fdea037
to
6380103
Compare
return true; | ||
|
||
// Create a class for modules with property ports. | ||
bool hasClassPorts = llvm::any_of(moduleLike.getPorts(), [](PortInfo port) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just thinking out loud -- doing this in parallel when it walked the module body made sense, but maybe less so to walk the ports + instance nodes for each?
Something like:
"has property ports" can be computed once per module (and looked up instance-side, instead of re-computed, so tracked in a (temporary?) data structure), and "shouldCreateClassImpl" can then be determined in a single (serial) bottom-up walk of instance graph, computing "hasPropertyPorts" for module definitions along the way.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's great idea. We have to fix InstanceGraph to walk entire modules/classes for bottom-up walk so let me first merge the PR. I'll follow-up to use a bottom-up walk.
This PR fixes a regression caused by PortAnnotation mutation. We have to batch-update the port annotations to avoid compile time regression. Several other improves are added along with the fix:
shouldCreateClass
is slightly expensive since it walks entire IR to check instances.shouldCreateClass
is already lazily evaluated but howevershouldCreateClass
is immediately called for every module. So this PR changes to compute the all results parallelly beforehand instead of lazy evaluation.std::distance(a, b) == 0
is replaced witha == b
to avoid potential iterator traversals (maybe it is optimized by clang/gcc though).processPathTrackers
is refactored into a helper structPathTracker
so that we can process paths/annotations parallely. We updatepathInfoTable
sequentially afterwards.It should be NFC completely. Pass should behave exactly the same way.