## Lecture 18: Pipelining

- Today's topics:
  - Hazards and instruction scheduling
  - Branch prediction
  - Out-of-order execution

## Example 2 – Bypassing

Show the instruction occupying each stage in each cycle (with bypassing) if I1 is R1+R2→R3 and I2 is R3+R4→R5 and I3 is R3+R8→R9.
 Identify the input latch for each input operand.

| CYC-1 | CYC-2 | CYC-3 | CYC-4 | CYC-5 | CYC-6 | CYC-7 | CYC-8 |
|-------|-------|-------|-------|-------|-------|-------|-------|
| IF    |
| D/R   |
| ALU   |
| DM    |
| RW    |

## Example 2 – Bypassing

Show the instruction occupying each stage in each cycle (with bypassing) if I1 is R1+R2→R3 and I2 is R3+R4→R5 and I3 is R3+R8→R9.
 Identify the input latch for each input operand.

| CYC-1 | CYC-2 | CYC-3   | CYC-4      | CYC-5   | CYC-6 | CYC-7 | CYC-8 |
|-------|-------|---------|------------|---------|-------|-------|-------|
| IF    | IF    | IF      | IF         | IF      | IF    | IF    | IF    |
| l1    | 12    | 13      | 14         | 15      |       |       |       |
| D/R   | D/R   | D/R     | D/R        | D/R     | D/R   | D/R   | D/R   |
|       | l1    | 12      | 13         | 14      |       |       |       |
| ALU   | ALU   | ALU ALU | ALU<br>ALU | ALU ALU | ALU   | ALU   | ALU   |
|       |       | l1      | 12         | 13      |       |       |       |
| DM    | DM    | DM      | DM         | DM      | DM    | DM    | DM    |
|       |       |         | l1         | 12      | 13    |       |       |
| RW    | RW    | RW      | RW         | RW      | RW    | RW    | RW    |
|       |       |         |            | l1      | 12    | 13    |       |







#### A 7 or 9 stage pipeline



lw \$1, 8(\$2)

add \$4, \$1, \$3

7

Without bypassing: 4 stalls

IF:IF:DE:DE:RR:AL:DM:DM:RW

IF: IF: DE:DE:DE:DE:DE:RR:AL:RW

With bypassing: 2 stalls

IF:IF:DE:DE:RR:AL:DM:DM:RW

IF: IF: DE:DE:DE:DE:RR:AL:RW



#### **Control Hazards**

- Simple techniques to handle control hazard stalls:
  - ➢ for every branch, introduce a stall cycle (note: every 6<sup>th</sup> instruction is a branch!)
  - assume the branch is not taken and start fetching the next instruction – if the branch is taken, need hardware to cancel the effect of the wrong-path instruction
  - ➤ fetch the next instruction (branch delay slot) and execute it anyway if the instruction turns out to be on the correct path, useful work was done if the instruction turns out to be on the wrong path, hopefully program state is not lost
  - make a smarter guess and fetch instructions from the expected target

# **Branch Delay Slots**



10

Source: H&P textbook

## Pipeline without Branch Predictor



## Pipeline with Branch Predictor



#### 2-Bit Prediction

- For each branch, maintain a 2-bit saturating counter:
   if the branch is taken: counter = min(3,counter+1)
   if the branch is not taken: counter = max(0,counter-1)
   ... sound familiar?
- If (counter >= 2), predict taken, else predict not taken
- The counter attempts to capture the common case for each branch

### **Bimodal Predictor**



#### Slowdowns from Stalls

- Perfect pipelining with no hazards → an instruction completes every cycle (total cycles ~ num instructions)
  → speedup = increase in clock speed = num pipeline stages
- With hazards and stalls, some cycles (= stall time) go by during which no instruction completes, and then the stalled instruction completes
- Total cycles = number of instructions + stall cycles