### Lecture 17: Pipelining

- Today's topics:
  - 5-stage pipeline
  - Hazards

### Performance Improvements?

- Does it take longer to finish each individual job?
- Does it take shorter to finish a series of jobs?
- What assumptions were made while answering these questions?
  - No dependences between instructions
  - Easy to partition circuits into uniform pipeline stages
  - No latch overhead
- Is a 10-stage pipeline better than a 5-stage pipeline?

#### **Quantitative Effects**

- As a result of pipelining:
  - Time in ns per instruction goes up
  - Each instruction takes more cycles to execute
  - But... average CPI remains roughly the same
  - Clock speed goes up
  - Total execution time goes down, resulting in lower average time per instruction
  - Under ideal conditions, speedup
    - = ratio of *elapsed times between successive instruction* completions
    - = number of pipeline stages = increase in clock speed

## A 5-Stage Pipeline



# Pipeline Summary

|                        | RR                        | ALU   | DM       | RW    |
|------------------------|---------------------------|-------|----------|-------|
| ADD R1, R2, → R3       | Rd R1,R2                  | R1+R2 |          | Wr R3 |
| BEQ R1, R2, 100<br>Con | Rd R1, R2<br>npare, Set P |       |          |       |
| LD 8[R3] → R6          | Rd R3                     | R3+8  | Get data | Wr R6 |
| ST 8[R3] ← R6          | Rd R3,R6                  | R3+8  | Wr data  |       |

#### Conflicts/Problems

- I-cache and D-cache are accessed in the same cycle it helps to implement them separately
- Registers are read and written in the same cycle easy to deal with if register read/write time equals cycle time/2
- Instructions can't skip the DM stage, else conflict for RW
- Consuming instruction may have to wait for producer
- Branch target changes only at the end of the second stage
  -- what do you do in the meantime?

#### Hazards

- Structural hazards: different instructions in different stages (or the same stage) conflicting for the same resource
- Data hazards: an instruction cannot continue because it needs a value that has not yet been generated by an earlier instruction
- Control hazard: fetch cannot continue because it does not know the outcome of an earlier branch – special case of a data hazard – separate category because they are treated in different ways

#### Structural Hazards

- Example: a unified instruction and data cache → stage 4 (MEM) and stage 1 (IF) can never coincide
- The later instruction and all its successors are delayed until a cycle is found when the resource is free → these are pipeline bubbles
- Structural hazards are easy to eliminate increase the number of resources (for example, implement a separate instruction and data cache, add more register ports)

#### **Data Hazards**

- An instruction produces a value in a given pipeline stage
- A subsequent instruction consumes that value in a pipeline stage
- The consumer may have to be delayed so that the time of consumption is later than the time of production

### Example 1 – No Bypassing

Show the instruction occupying each stage in each cycle (no bypassing) if I1 is R1+R2→R3 and I2 is R3+R4→R5 and I3 is R7+R8→R9
 CYC-1 CYC-2 CYC-3 CYC-4 CYC-5 CYC-6 CYC-7 CYC-8
 IF IF IF IF IF IF IF IF IF

D/R D/R D/R D/R D/R D/R D/R D/R **ALU ALU ALU ALU** ALU ALU **ALU** ALU DM DM DM DM DM DM DM DM

RW RW RW RW RW RW

10

### Example 1 – No Bypassing

Show the instruction occupying each stage in each cycle (no bypassing)
 if I1 is R1+R2→R3 and I2 is R3+R4→R5 and I3 is R7+R8→R9

| CYC-1    | CYC-2     | CYC-3     | CYC-4     | CYC-5     | CYC-6     | CYC-7     | CYC-8    |
|----------|-----------|-----------|-----------|-----------|-----------|-----------|----------|
| IF<br>I1 | IF<br>12  | IF<br>I3  | IF<br>I3  | IF<br>I3  | IF<br>14  | IF<br>15  | IF       |
| D/R      | D/R<br>I1 | D/R<br>I2 | D/R<br>I2 | D/R<br>I2 | D/R<br>I3 | D/R<br>I4 | D/R      |
| ALU      | ALU       | ALU<br>I1 | ALU       | ALU       | ALU<br>I2 | ALU<br>I3 | ALU      |
| DM       | DM        | DM        | DM<br>I1  | DM        | DM        | DM<br>I2  | DM<br>I3 |
| RW       | RW        | RW        | RW        | RW<br>I1  | RW        | RW        | RW<br>I2 |

11

### Example 2 – Bypassing

Show the instruction occupying each stage in each cycle (with bypassing) if I1 is R1+R2→R3 and I2 is R3+R4→R5 and I3 is R3+R8→R9.
 Identify the input latch for each input operand.

| CYC-1 | CYC-2 | CYC-3 | CYC-4 | CYC-5 | CYC-6 | CYC-7 | CYC-8 |
|-------|-------|-------|-------|-------|-------|-------|-------|
| IF    |
| D/R   |
| ALU   |
| DM    |
| RW    |

12

### Example 2 – Bypassing

Show the instruction occupying each stage in each cycle (with bypassing) if I1 is R1+R2→R3 and I2 is R3+R4→R5 and I3 is R3+R8→R9.
 Identify the input latch for each input operand.

| CYC-1 | CYC-2 | CYC-3   | CYC-4   | CYC-5        | CYC-6 | CYC-7 | CYC-8 |
|-------|-------|---------|---------|--------------|-------|-------|-------|
| IF    | IF    | IF      | IF      | IF           | IF    | IF    | IF    |
| l1    | 12    | 13      | 14      | 15           |       |       |       |
| D/R   | D/R   | D/R     | D/R     | D/R          | D/R   | D/R   | D/R   |
|       | l1    | 12      | 13      | 14           |       |       |       |
| ALU   | ALU   | ALU ALU | ALU ALU | L5 L3<br>ALU | ALU   | ALU   | ALU   |
|       |       | l1      | 12      | 13           |       |       |       |
| DM    | DM    | DM      | DM      | DM           | DM    | DM    | DM    |
|       |       |         | l1      | 12           | 13    |       |       |
| RW    | RW    | RW      | RW      | RW           | RW    | RW    | RW    |
|       |       |         |         | l1           | 12    | 13    |       |

#### Problem 1



### Problem 2



### Problem 3



## Title

Bullet