#### Assignment 9

##### Due: 9:00am, Thu Apr 11th, 2024

Note: Make reasonable assumptions where necessary and clearly state them. Feel free to discuss problems with classmates, but the only written material that you may consult while writing your solutions are the textbook and lecture slides/videos. Solutions should be uploaded on Gradescope. Show your solution steps so you receive partial credit for incorrect answers and we know you have understood the material. Don't just show us the final answer.

Every homework has an automatic penalty-free 1.5 day extension to accommodate any covid/family-related disruptions. In other words, try to finish your homework by Thursday 10:45am to keep up with the lecture content, but if necessary, you may take until Friday 11:59pm.

1. Consider a program that can execute with no stalls and a CPI of 1 if the underlying processor can somehow magically service every load instruction with a 1-cycle L1 cache hit. In practice, 9% of all load instructions suffer from an L1 cache miss, 6% of all load instructions suffer from an L2 cache miss, and 3% of all load instructions suffer from an L3 cache miss (and are serviced by the memory system). An L1 cache miss stalls the processor for 8 cycles while the L2 is looked up. An L2 cache miss stalls the processor for 25 cycles while the L3 is looked up. An L3 cache miss stalls the processor for an additional 200 cycles while data is fetched from memory. What is the CPI for this program if 35% of the program's instructions are load instructions? (40 points)
2. Consider an L1 cache that has 8 sets, is direct-mapped (1-way), and supports a block size of 64 bytes. How many bits of the address are used to calculate the offset, index, and tag (assume that the CPU generates 32-bit addresses)? For the following memory access pattern (shown as byte addresses), show which accesses are hits and misses. For each hit, indicate the set that yields the hit. (30 points)
40, 96, 112, 24, 400, 500, 560, 32, 600, 48, 80.
3. A 128 KB L1 cache has a 128 byte block size and is 2-way set-associative. How many sets does the cache have? How many bits are used for the offset, index, and tag, assuming that the CPU provides 32-bit addresses? How large is the tag array? Please show your equations and steps. (30 points)