Skip to content

Commit

Permalink
[Pipelines] Additional unrolling in LTO (#536)
Browse files Browse the repository at this point in the history
Some workloads require specific sequences of events to happen
to fully simplify. This adds an extra full unrolling pass to help these
cases on the cores with branch predictors. It helps produce simplified
loops, which can then be SROA'd allowing further simplification, which
can be important for performance.

The feature adds extra compile time to get extra performance and
is enabled by the opt flag 'extra-LTO-loop-unroll' (off by default).

Original patch by David Green (david.green@arm.com)
  • Loading branch information
VladiKrapp-Arm authored Oct 16, 2024
1 parent 557d794 commit 41e8b9f
Show file tree
Hide file tree
Showing 5 changed files with 67 additions and 23 deletions.
3 changes: 2 additions & 1 deletion OmaxLTO.cfg
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
-flto=full \
-fvirtual-function-elimination \
-fwhole-program-vtables
-fwhole-program-vtables \
-mllvm -extra-LTO-loop-unroll=true
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -172,6 +172,7 @@ and/or increased memory usage during linking. Some of the options in the config
corresponding optimisation passes in the [LLVM project](https://github.com/llvm/llvm-project)
to find out more. Users are also encouraged to create their own configs and tune their own
flag parameters.
Information on LLVM Embedded Toolchain for Arm specific optimization flags is available in [Optimization Flags](https://github.com/ARM-software/LLVM-embedded-toolchain-for-Arm/blob/main/docs/optimization-flags.md)

Binary releases of the LLVM Embedded Toolchain for Arm are based on release
branches of the upstream LLVM Project, thus can safely be used with all tools
Expand Down
9 changes: 9 additions & 0 deletions docs/optimization-flags.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
Additional optimization flags
=============================

## Additional loop unroll in the LTO pipeline
In some cases it is benefitial to perform an additional loop unroll pass so that extra information becomes available to later passes, e.g. SROA.
Use cases where this could be beneficial - multiple (N>=4) nested loops.

### Usage:
-mllvm -extra-LTO-loop-unroll=true/false
22 changes: 0 additions & 22 deletions patches/llvm-project-perf/0000-Placeholder-commit.patch

This file was deleted.

55 changes: 55 additions & 0 deletions patches/llvm-project-perf/0001-LTOpasses-add-loop-unroll.patch
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
From 4adfc5231d2c0182d6278b4aa75eec57648e5dd4 Mon Sep 17 00:00:00 2001
From: Vladi Krapp <vladi.krapp@arm.com>
Date: Tue, 3 Sep 2024 14:12:48 +0100
Subject: [Pipelines] Additional unrolling in LTO

Some workloads require specific sequences of events to happen
to fully simplify. This adds an extra full unrolling pass to help these
cases on the cores with branch predictors. It helps produce simplified
loops, which can then be SROA'd allowing further simplification, which
can be important for performance.
Feature adds extra compile time to get extra performance and
is enabled by the opt flag 'extra-LTO-loop-unroll' (off by default).

Original patch by David Green (david.green@arm.com)
---
llvm/lib/Passes/PassBuilderPipelines.cpp | 16 ++++++++++++++++
1 file changed, 16 insertions(+)

diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp b/llvm/lib/Passes/PassBuilderPipelines.cpp
index 1184123c7710..6dc45d85927a 100644
--- a/llvm/lib/Passes/PassBuilderPipelines.cpp
+++ b/llvm/lib/Passes/PassBuilderPipelines.cpp
@@ -332,6 +332,10 @@ namespace llvm {
extern cl::opt<unsigned> MaxDevirtIterations;
} // namespace llvm

+static cl::opt<bool> LTOExtraLoopUnroll(
+ "extra-LTO-loop-unroll", cl::init(false), cl::Hidden,
+ cl::desc("Perform extra loop unrolling pass to assist SROA"));
+
void PassBuilder::invokePeepholeEPCallbacks(FunctionPassManager &FPM,
OptimizationLevel Level) {
for (auto &C : PeepholeEPCallbacks)
@@ -1940,6 +1944,18 @@ PassBuilder::buildLTODefaultPipeline(OptimizationLevel Level,
MPM.addPass(createModuleToPostOrderCGSCCPassAdaptor(ArgumentPromotionPass()));

FunctionPassManager FPM;
+
+ if (LTOExtraLoopUnroll) {
+ LoopPassManager OmaxLPM;
+ OmaxLPM.addPass(LoopFullUnrollPass(Level.getSpeedupLevel(),
+ /* OnlyWhenForced= */ !PTO.LoopUnrolling,
+ PTO.ForgetAllSCEVInLoopUnroll));
+ FPM.addPass(
+ createFunctionToLoopPassAdaptor(std::move(OmaxLPM),
+ /*UseMemorySSA=*/false,
+ /*UseBlockFrequencyInfo=*/true));
+ }
+
// The IPO Passes may leave cruft around. Clean up after them.
FPM.addPass(InstCombinePass());
invokePeepholeEPCallbacks(FPM, Level);
--
2.34.1

0 comments on commit 41e8b9f

Please sign in to comment.