[Notes] LICM (Loop Invariant Code Motion) and its implementation in LLVM

Posted on 2021-04-20 Edited on 2021-04-24

A Simple Lead-in

before LICM

T *p = /* loop invariant address */
for (init; cond; inc) {
  a = /* loop invariant */;
  c = a + d;
  *p = x;
  outlive = /* loop invariant */;
}

after LICM

a = /* loop invariant */; // hoist
T t = *p;
for (init; cond; inc) {
  c = a + d;
  t = x;
}
outlive = /* loop invariant */; // sink
*p = t; // promote memory accesses to scalars

Loop

Dominators

Node \(a\) dominates node \(b\) \((a \operatorname{dom} b)\) in a graph if every path from the entry node to \(b\) goes through \(a\).

Node \(a\) immediately dominates node \(b\) \((a \operatorname{idom} b)\) if \(\nexists c: a \operatorname{dom} c \wedge c \operatorname{dom} b\).

The (immediately) dominant relationships between nodes form a tree with the entry node as its root.
The dominator tree of a flow graph can be constructed using an algorithm developed by Lengauer and Tarjan with \(O(|E|\alpha(|E|, |V|))\) complexity

Back Edges

back edge: an edge \((\textit{tail} \rightarrow \textit{head})\) with condition \(\textit{head} \operatorname{dom} \textit{tail}\) satisfied.

Natural Loops

the natural loop of a back edge \((\textit{tail} \rightarrow \textit{head})\): all nodes that can reach \(\textit{tail}\) after removing \(\textit{head}\), including \(\textit{head}\) itself. the \(\textit{head}\) node is its single entry-point.

Find Loops

A intuitional simple process:

Construct the dominator tree of the control flow graph
Find out the back edges
and the natural loops associated with them

more details:

Traverse the tree in the depth-first order (Depth-first spanning tree)
When traverse an edge \(\textit{tail} \rightarrow \textit{head}\) where \(\textit{head}\) is an ancestor of \(\textit{tail}\) in the depth-first spanning tree, check whether \(\textit{head}\) dominates \(\textit{tail}\). If true, the edge is a back edge.
Delete \(\textit{head}\) from the graph, and find nodes that can reach \(\textit{tail}\) still.

Loop Invariant

Loop-Invariant Computation: a computation which computes the same result each time evaluated. The result of a Loop-Invariant Computation is a Loop Invariant.

Detecting Loop Invariants

Basic Cases
- it's a constant
- all of it's defs are outside the loop (read-only in the loop)
Inductive Cases
- it’s an idempotent computation all of whose args are Loop Invariants
- there is exactly one reaching def for it in the loop, and the def's rhs is a Loop Invariant
Repeat this process, until no more Loop Invariants can be detected.

Code Motion

Types of Code Motion
- hoist instructions to the loop preheader (the predecessor of the loop header), if used within loop
- sink instructions to the exit blocks of the loop, if only used after loop
- extra: promote memory accesses to scalars (promoteLoopAccessesToScalars)
Conditions
- Aimed at improving performance of the program
  - the instruction should not be executed more often after motion
  - before moving an instruction, check if the destination block is "colder" than the original block. (worthSinkOrHoistInst)
- Can not change semantics of the program
  - the candidate instruction should dominates all the exits of the loop (otherwise it may be wrongly executed, e.g. the code in a conditional branch)
  - and should dominates all its users in the entire loop (when hoisting)
  - the destination of the instruction must dominates all its users too
some more aggressive optimizations

Algorithm

depth-first search on the control-flow graph, sink or hoist candidate instructions if all the invariant operations it depends upon have been moved

Implementation in LLVM

fn LICMPass::run()
/// line 277

fn LoopInvariantCodeMotion::runOnLoop()
/// line 348, call on line 285

fn llvm::sinkRegion()
/// line 530, call on line 412
/// Walk the specified region of the CFG (defined by all blocks dominated by
/// the specified block, and that are in the current loop) in reverse depth
/// first order w.r.t the DominatorTree.  This allows us to visit uses before
/// definitions, allowing us to sink a loop body in one pass without iteration.

fn worthSinkOrHoistInst()
/// line 831, call on line 923
// Hoisting/sinking instruction out of a loop isn't always beneficial. It's only
// only worthwhile if the destination block is actually colder than current
// block.

fn llvm::hoistRegion()
/// line 862, call on line 416
/// Walk the specified region of the CFG (defined by all blocks dominated by
/// the specified block, and that are in the current loop) in depth first
/// order w.r.t the DominatorTree.  This allows us to visit definitions before
/// uses, allowing us to hoist a loop body in one pass without iteration.

fn isHoistableAndSinkableInst()
/// line 1118, call on line 1175
/// Return true if-and-only-if we know how to (mechanically) both hoist and
/// sink a given instruction out of a loop.  Does not address legality
/// concerns such as aliasing or speculation safety.

fn canSinkOrHoistInst()
/// line 1165, call on line 921

fn sinkThroughTriviallyReplaceablePHI()
/// line 1556, call on line 1768

fn sink()
/// line 1665, call on line 583
/// When an instruction is found to only be used outside of the loop, this
/// function moves it to the exit blocks and patches up SSA form as needed.
/// This method is guaranteed to remove the original instruction from its
/// position, and may either delete it or move it to outside of the loop.

fn hoist()
/// line 1780, call on line 927
/// When an instruction is found to only use loop invariant operands that
/// is safe to hoist, this instruction is called to do the dirty work.

fn llvm::promoteLoopAccessesToScalars()
/// line 1987, call on line 489

fn collectPromotionCandidates()
/// line 2275, call on line 488

fn isLoadInvariantInLoop()
/// line 1050, call on line 1198
// Return true if LI is invariant within scope of the loop. LI is invariant if
// CurLoop is dominated by an invariant.start representing the same memory
// location and size as the memory location LI loads from, and also the
// invariant.start has no uses.

fn isReadOnly()
/// line 1130, call on line 1276
/// Return true if all of the alias sets within this AST are known not to
/// contain a Mod, or if MSSA knows thare are no MemoryDefs in the loop.

fn isOnlyMemoryAccess()
/// line 1148, call on line 1301
/// Return true if I is the only Instruction with a MemoryAccess in L.

fn isTriviallyReplaceablePHI()
/// line 1387, call on line 1703
/// Returns true if a PHINode is a trivially replaceable with an Instruction.
/// This is true when all incoming values are that instruction.
/// This pattern occurs most often with LCSSA PHI nodes.

fn isNotUsedOrFreeInLoop()
/// line 1426, call on line 580
/// Return true if the only users of this instruction are outside of the loop.
/// If this is true, we can sink the instruction to the exit blocks of the loop.
/// We also return true if the instruction could be folded away in lowering.
/// (e.g.,  a GEP can be folded into a load as an addressing mode in the loop).

fn isFreeInLoop()
/// line 1396, call on line 1430
/// Return true if the instruction is free in the loop.

fn cloneInstructionInExitBlock()
/// line 1458, call on line 1569

fn eraseInstruction()
/// line 1530, call on line 908

fn moveInstructionBefore()
/// line 1540, call on line 1804

fn canSplitPredecessors()
/// line 1574, call on line 1706

fn splitPredecessorsOfLoopExit()
/// line 1593, call on line 1711

fn isSafeToExecuteUnconditionally()
/// line 1821, call on line 924
/// Only sink or hoist an instruction if it is not a trapping instruction,
/// or if the instruction is known not to trap when moved to the preheader.
/// or if it is a trapping instruction and is guaranteed to execute.

fn isKnownNonEscaping()
/// line 1960, call on line 2063

fn foreachMemoryAccess()
/// line 2265, call on line 477

fn LoopInvariantCodeMotion::collectAliasInfoLoop()
/// line 2331, call on line 384
/// Returns an owning pointer to an alias set which incorporates aliasing info from L and all subloops of L.

fn pointerInvalidatedByLoop()
/// line 2349, call on line 1207

fn pointerInvalidatedByLoopWithMSSA()
/// line 2396, call on line 1210

fn pointerInvalidatedByBlockWithMSSA()
/// line 2442, call on line 1210

fn inSubLoop()
/// line 2455, call on line 555
/// Little predicate that returns true if the specified basic block is in
/// a subloop of the current one, not the current one itself.