# Lecture 18: Pipelining

- Today's topics:
  - 5-stage pipeline
  - Hazards
  - Data dependence handling with bypassing
  - Data dependence examples

Hw 7 posted later today (due next Tue/Wed)

#### -cycle proc A 5-Stage Pipeline Cyc! Time (in clock cycles) CC 6 CC 3 ALU DM Reg »IZ fin In cych IM Reg DM A T3 fin in cyc7 Reg Rec A A Beg 2 Source **RP** textbook



# **Quantitative Effects**

- As a result of pipelining:
  - Time in ns per instruction goes up
  - Each instruction takes more cycles to execute
  - But... average CPI remains roughly the same
  - Clock speed goes up
  - Total execution time goes down, resulting in lower average time per instruction
  - Under ideal conditions, speedup
    - gap bet = ratio of elapsed times between successive instruction
  - completions
    - = number of pipeline stages = increase in clock speed

Hazards



- Structural hazards: different instructions in different stages (or the same stage) conflicting for the same resource
- Data hazards: an instruction cannot continue because it needs a value that has not yet been generated by an earlier instruction

 Control hazard. fetch cannot continue because it does not know the outcome of an earlier branch – special case of a data hazard – separate category because they are treated in different ways

# **Conflicts/Problems**

- JM JM
  I-cache and D-cache are accessed in the same cycle it helps to implement them separately
- Registers are read and written in the same cycle easy to deal with if register read/write time equals cycle time/2
- Instructions can't skip the DM stage, else conflict for RW
- Consuming instruction may have to wait for producer

Branch target changes only at the end of the second stage
 -- what do you do in the meantime?

- Example: a unified instruction and data cache → stage 4 (MEM) and stage 1 (IF) can never coincide
- The later instruction and all its successors are delayed until a cycle is found when the resource is free → these are pipeline bubbles
- Structural hazards are easy to eliminate increase the number of resources (for example, implement a separate instruction and data cache, add more register ports)

- An instruction *produces* a value in a given pipeline stage
- A subsequent instruction *consumes* that value in a pipeline stage
- The consumer may have to be delayed so that the time of consumption is later than the time of production



## Example 1 – No Bypassing

Show the instruction occupying each stage in each cycle (no bypassing) if I1 is R1+R2→R3 and I2 is R3+R4→R5 and I3 is R7+R8→R9



# Example 2 – (Bypassing

Show the instruction occupying each stage in each cycle (with bypassing)
 if I1 is R1+R2→R3 and I2 is R3+R4→R5 and I3 is R3+R8→R9.
 Identify the input latch for each input operand.



## Example 2 – Bypassing

Show the instruction occupying each stage in each cycle (with bypassing) if I1 is R1+R2→R3 and I2 is R3+R4→R5 and I3 is R3+R8→R9.
 Identify the input latch for each input operand.



# Problem 1



# Problem 2



## Problem 3

