#### Lecture 22: Out-of-Order, Cache Hierarchies

HW 8 due tomorrow.

• Today's topics:

Hw 9 posted Thursday, due 4/11

- Out of order processors
- Cache access intro and details

in-order 5, stage Jaya Buffer

4aya RW

21aya

R1 ← R2 ← R3 ← R1 ←

# An Out-of-Order Processor Implementation



#### An Out-of-Order Processor Implementation



Brpredace= 90%. **Example Code** 0.9×0.9×0.9 with in-order Completion times with ooo 0.95 1 NSW 30 ADD R1, R2, R3 ADD R4, R1, R2 R5, 8(R4)>> LW ADD R7, R6, R5 ADD R8, R7, R5 LW R9. 16(R4) ADD R10, R6, R9 13/ ADD R11, R10, R9 1PC= 8instrs = 1.33

#### **Cache Hierarchies**



- Data and instructions are stored on DRAM chips DRAM is a technology that has high bit density, but relatively poor latency an access to data in memory can take as many as 300 cycles today!
- Hence, some data is stored on the processor in a structure called the cache – caches employ SRAM technology, which is faster, but has lower bit density
- Internet browsers also cache web pages same concept

#### Memory Hierarchy



# 1-cycle 21 with 90% hit rate Locality 10-cycle LZ with 50% hit rate

• Why do caches work? 5 LZ hits 5×11 yc

• Why do caches work? 5 LZ hisse (menory) 5 x (1+10+300)

• Temporal locality: if you used some data recently, you will likely use it again

• Spatial locality if you used some data recently you = 1700 yr

Spatial locality: if you used some data recently, you will likely access its neighbors

• No hierarchy: average access time for data = 300 cycles  $\frac{1}{\alpha cces}$ 

• 32KB 1-cycle L1 cache that has a hit rate of 95%:

Total time: 
$$9545\times301 = 1600$$
 Gyr  
, time permen  $acc=16$  Gyr<sup>7</sup>

### Accessing the Cache



## Accessing the Cache

#### Accessing the Cache



#### The Tag Array



#### **Example Access Pattern**



#### **Increasing Line Size**



#### Associativity



#### Associativity



#### Example 1

- 32 KB 4-way set-associative data cache array with 32 byte line sizes
- How many sets?
- How many index bits, offset bits, tag bits?
- How large is the tag array?

```
Cache size = #sets x #ways x blocksize

Index bits = log<sub>2</sub>(sets)

Offset bits = log<sub>2</sub>(blocksize)

Addr width = tag + index + offset
```

#### Example 1

 32 KB 4-way set-associative data cache array with 32 byte line sizes

cache size = #sets x #ways x block size

- How many sets? 256
- How many index bits, offset bits, tag bits?

8 5 19  $log_2(sets)$   $log_2(blksize)$  addrsize-index-offset

How large is the tag array?
 tag array size = #sets x #ways x tag size
 = 19 Kb = 2.375 KB