Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FIRRTL][LowerClasses] Improve performance, NFC #7060

Merged
merged 4 commits into from
May 21, 2024
Merged

Conversation

uenoku
Copy link
Member

@uenoku uenoku commented May 17, 2024

This PR fixes a regression caused by PortAnnotation mutation. We have to batch-update the port annotations to avoid compile time regression. Several other improves are added along with the fix:

  • shouldCreateClass is slightly expensive since it walks entire IR to check instances. shouldCreateClass is already lazily evaluated but however shouldCreateClass is immediately called for every module. So this PR changes to compute the all results parallelly beforehand instead of lazy evaluation.
  • std::distance(a, b) == 0 is replaced with a == b to avoid potential iterator traversals (maybe it is optimized by clang/gcc though).
  • processPathTrackers is refactored into a helper struct PathTracker so that we can process paths/annotations parallely. We update pathInfoTable sequentially afterwards.
  • Several operation walks were replaced with instance graph traversal.

It should be NFC completely. Pass should behave exactly the same way.

@uenoku uenoku requested a review from mikeurbach May 17, 2024 12:56
Copy link
Contributor

@dtzSiFive dtzSiFive left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, nice work thanks!

lib/Dialect/FIRRTL/Transforms/LowerClasses.cpp Outdated Show resolved Hide resolved
lib/Dialect/FIRRTL/Transforms/LowerClasses.cpp Outdated Show resolved Hide resolved
@@ -447,7 +602,7 @@ LogicalResult LowerClassesPass::processPaths(
}

// If we are at a leaf, nothing to do.
if (std::distance(it->begin(), it->end()) == 0) {
if (it->begin() == it->end()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

llvm::size(A, B) exists specifically to confidently do std::distance(..) on ranges, just related tip.

For checking distance of zero, comparison seems good to me! 👍

@uenoku uenoku force-pushed the dev/hidetou/lower-class branch from b7b104d to 915f5ac Compare May 17, 2024 14:00
// Fill `shouldCreateClassMemo`.
for (auto moduleLike : circuit.getOps<FModuleLike>())
shouldCreateClassMemo.insert({moduleLike, false});
parallelForEach(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also check the single-threaded performance. Parallelization only improves walltime. We need to improve absolute time also.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the single threaded performance on a large internal design, and this change leads to a >90% reduction in runtime in a single threaded context.

Copy link
Contributor

@mikeurbach mikeurbach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏 this is great, thank you for cleaning up the path stuff as you go, and for all the performance improvements. The use of instance graph instead of walking modules is great--that was one that I had top of mind.

@youngar youngar added the FIRRTL Involving the `firrtl` dialect label May 17, 2024
@youngar youngar changed the title [LowerClasses] Improve performance, NFC [FIRRTL][LowerClasses] Improve performance, NFC May 17, 2024
Comment on lines 503 to 508
if (isa<PortAnnoTarget>(target))
// Port annotations are batch-updated afterwards.
portAnnotations.push_back(annotations.getArrayAttr());
else
// Otherwise it's fine to update immediately.
target.setAnnotations(annotations);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

target.setAnnotations(annotations); could be made slightly faster with cast<OpAnnoTarget>(target)setAnnotations(annotations);

I don't love something about the way this sometimes mutates an internal structure and other times just sets the annotations. I think I would be happier if this function always appended to an annotation array, and op modifications happened outside this function, and then portAnnotations doesn't need to be a member anymore. It would be even nicer if this whole function didn't care about the target type at all - but we need to know when we create the targetSym. WDYT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed processPathTrackers to return FailureOr<AnnotationSet> and mutate annotations from callers. Does that look good to you?

@uenoku uenoku force-pushed the dev/hidetou/lower-class branch from 4a5e2e1 to fdea037 Compare May 20, 2024 07:02
@uenoku uenoku force-pushed the dev/hidetou/lower-class branch from fdea037 to 6380103 Compare May 20, 2024 07:06
return true;

// Create a class for modules with property ports.
bool hasClassPorts = llvm::any_of(moduleLike.getPorts(), [](PortInfo port) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just thinking out loud -- doing this in parallel when it walked the module body made sense, but maybe less so to walk the ports + instance nodes for each?

Something like:
"has property ports" can be computed once per module (and looked up instance-side, instead of re-computed, so tracked in a (temporary?) data structure), and "shouldCreateClassImpl" can then be determined in a single (serial) bottom-up walk of instance graph, computing "hasPropertyPorts" for module definitions along the way.

WDYT?

Copy link
Member Author

@uenoku uenoku May 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's great idea. We have to fix InstanceGraph to walk entire modules/classes for bottom-up walk so let me first merge the PR. I'll follow-up to use a bottom-up walk.

@uenoku uenoku merged commit 2b6184d into main May 21, 2024
4 checks passed
@uenoku uenoku deleted the dev/hidetou/lower-class branch May 21, 2024 09:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FIRRTL Involving the `firrtl` dialect
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants