## Lecture 20: Branches, OOO



- Branch prediction
- Out of-order execution

• (Also see class notes on pipelining, hazards, etc.) figure

(Also see class notes on p

stalls between prod- as stalls from control hazards



POC.

A 7 or 9 stage pipeline, RR and RW take an entire stage



# Problem 4 – with Byp

A 7 or 9 stage pipeline, RR and RW take an entire stage



#### Problem 4

Without bypassing: 4 stalls

IF:IF:DE:DE:RR:AL:DM:DM:RW

IF: IF: DE: DE: DE: DE: RR: AL: RW

With bypassing: 2 stalls

IF:IF:DE:DE:RR:AL:DM:DM:RW

IF: IF: DE: DE: DE: RR: AL: RW



## Pipelining Example (Recap)



- Unpipelined design: the entire circuit takes 10ns to finish

  Cycle time = 10ns; Clock speed = 1/10ns = 100 MHz

  CPI = 1 (assuming no stalls)

  Throughput in instructions per second = 
  #cycles in a second x instructions-per-cycle = 
  100 M x 1 = 100 M instrs per second = 0.1 BIPS (billion instrs per sec)
- 5-stage pipeline: under ideal conditions, each stage takes 2ns

  Cycle time = 2ns; Clock speed = 1/2ns = 500 MHz (5x higher)

  CPI = 1 (continuing to assume no stalls)

  Throughput = # cycles in a second x instrs-per-cycle

  = 500 M x 1 = 500 MIPS = 0.5 BIPS

Under ideal conditions, a 5-stage pipeline gives a 5x speedup.



**Control Hazards RW** D/R **ALU** DM 3 Bubble CA was **RW** DM **ALU** 2 PC+4 +1 00 RW D/R DM **ALU** RW DM when Bo is NT: no bubble 1 bubble (made a bad guess, When BRIST:



Source: H&P textbook

## Pipeline without Branch Predictor



## Pipeline with Branch Predictor



### **Bimodal Predictor**



#### 2-Bit Prediction

- For each branch, maintain a 2-bit saturating counter:
   if the branch is taken: counter = min(3,counter+1)
   if the branch is not taken: counter = max(0,counter-1)
   ... sound familiar?
- If (counter >= 2), predict taken, else predict not taken
- The counter attempts to capture the common case for each branch

Indexing functions
Multiple branch predictors
History, trade-offs

#### Slowdowns from Stalls

- Perfect pipelining with no hazards → an instruction completes every cycle (total cycles ~ num instructions)
   → speedup = increase in clock speed = num pipeline stages
- With hazards and stalls, some cycles (= stall time) go by during which no instruction completes, and then the stalled instruction completes
- Total cycles = number of instructions + stall cycles

### Multicycle Instructions



© 2003 Elsevier Science (USA). All rights reserved.

- Multiple parallel pipelines each pipeline can have a different number of stages
- Instructions can now complete out of order must make sure that writes to a register happen in the correct order