-
Notifications
You must be signed in to change notification settings - Fork 97
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Pipelines] Additional unrolling in LTO (#536)
Some workloads require specific sequences of events to happen to fully simplify. This adds an extra full unrolling pass to help these cases on the cores with branch predictors. It helps produce simplified loops, which can then be SROA'd allowing further simplification, which can be important for performance. The feature adds extra compile time to get extra performance and is enabled by the opt flag 'extra-LTO-loop-unroll' (off by default). Original patch by David Green (david.green@arm.com)
- Loading branch information
1 parent
557d794
commit 41e8b9f
Showing
5 changed files
with
67 additions
and
23 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,4 @@ | ||
-flto=full \ | ||
-fvirtual-function-elimination \ | ||
-fwhole-program-vtables | ||
-fwhole-program-vtables \ | ||
-mllvm -extra-LTO-loop-unroll=true |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
Additional optimization flags | ||
============================= | ||
|
||
## Additional loop unroll in the LTO pipeline | ||
In some cases it is benefitial to perform an additional loop unroll pass so that extra information becomes available to later passes, e.g. SROA. | ||
Use cases where this could be beneficial - multiple (N>=4) nested loops. | ||
|
||
### Usage: | ||
-mllvm -extra-LTO-loop-unroll=true/false |
This file was deleted.
Oops, something went wrong.
55 changes: 55 additions & 0 deletions
55
patches/llvm-project-perf/0001-LTOpasses-add-loop-unroll.patch
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,55 @@ | ||
From 4adfc5231d2c0182d6278b4aa75eec57648e5dd4 Mon Sep 17 00:00:00 2001 | ||
From: Vladi Krapp <vladi.krapp@arm.com> | ||
Date: Tue, 3 Sep 2024 14:12:48 +0100 | ||
Subject: [Pipelines] Additional unrolling in LTO | ||
|
||
Some workloads require specific sequences of events to happen | ||
to fully simplify. This adds an extra full unrolling pass to help these | ||
cases on the cores with branch predictors. It helps produce simplified | ||
loops, which can then be SROA'd allowing further simplification, which | ||
can be important for performance. | ||
Feature adds extra compile time to get extra performance and | ||
is enabled by the opt flag 'extra-LTO-loop-unroll' (off by default). | ||
|
||
Original patch by David Green (david.green@arm.com) | ||
--- | ||
llvm/lib/Passes/PassBuilderPipelines.cpp | 16 ++++++++++++++++ | ||
1 file changed, 16 insertions(+) | ||
|
||
diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp b/llvm/lib/Passes/PassBuilderPipelines.cpp | ||
index 1184123c7710..6dc45d85927a 100644 | ||
--- a/llvm/lib/Passes/PassBuilderPipelines.cpp | ||
+++ b/llvm/lib/Passes/PassBuilderPipelines.cpp | ||
@@ -332,6 +332,10 @@ namespace llvm { | ||
extern cl::opt<unsigned> MaxDevirtIterations; | ||
} // namespace llvm | ||
|
||
+static cl::opt<bool> LTOExtraLoopUnroll( | ||
+ "extra-LTO-loop-unroll", cl::init(false), cl::Hidden, | ||
+ cl::desc("Perform extra loop unrolling pass to assist SROA")); | ||
+ | ||
void PassBuilder::invokePeepholeEPCallbacks(FunctionPassManager &FPM, | ||
OptimizationLevel Level) { | ||
for (auto &C : PeepholeEPCallbacks) | ||
@@ -1940,6 +1944,18 @@ PassBuilder::buildLTODefaultPipeline(OptimizationLevel Level, | ||
MPM.addPass(createModuleToPostOrderCGSCCPassAdaptor(ArgumentPromotionPass())); | ||
|
||
FunctionPassManager FPM; | ||
+ | ||
+ if (LTOExtraLoopUnroll) { | ||
+ LoopPassManager OmaxLPM; | ||
+ OmaxLPM.addPass(LoopFullUnrollPass(Level.getSpeedupLevel(), | ||
+ /* OnlyWhenForced= */ !PTO.LoopUnrolling, | ||
+ PTO.ForgetAllSCEVInLoopUnroll)); | ||
+ FPM.addPass( | ||
+ createFunctionToLoopPassAdaptor(std::move(OmaxLPM), | ||
+ /*UseMemorySSA=*/false, | ||
+ /*UseBlockFrequencyInfo=*/true)); | ||
+ } | ||
+ | ||
// The IPO Passes may leave cruft around. Clean up after them. | ||
FPM.addPass(InstCombinePass()); | ||
invokePeepholeEPCallbacks(FPM, Level); | ||
-- | ||
2.34.1 | ||
|