Assignment 9
Due: 9:00am, Thu Apr 11th, 2024
Note: Make reasonable assumptions where necessary and clearly state them.
Feel free to discuss problems with classmates, but the only written material
that you may consult while writing your solutions are the textbook
and lecture slides/videos.
Solutions should be uploaded on Gradescope.
Show your solution steps so you receive partial credit for incorrect
answers and we know you have understood the material. Don't just show us the
final answer.
Every homework has an automatic penalty-free 1.5 day extension to
accommodate any covid/family-related disruptions. In other words, try to
finish your homework by Thursday 10:45am to keep up with the lecture
content, but if necessary, you may take until Friday 11:59pm.
- Consider a program that can execute with no stalls and a CPI of 1
if the underlying processor can somehow magically service every load
instruction with a 1-cycle L1 cache hit. In practice, 9% of all
load instructions suffer from an L1 cache miss, 6% of all load
instructions suffer from an L2 cache miss, and 3% of all load
instructions suffer from an L3 cache miss (and are serviced by the
memory system). An L1 cache miss stalls the processor for 8 cycles
while the L2 is looked up. An L2 cache miss stalls the processor for
25 cycles while the L3 is looked up. An L3 cache miss stalls the
processor for an additional 200 cycles while data is fetched from memory.
What is the CPI for this program if 35% of the program's instructions
are load instructions? (40 points)
- Consider an L1 cache that has 8 sets, is direct-mapped (1-way), and
supports a block size of 64 bytes. How many bits of the address are
used to calculate the offset, index, and tag (assume that the CPU
generates 32-bit addresses)? For the following memory
access pattern (shown as byte addresses), show which accesses are hits
and misses. For each hit, indicate the set that yields the hit.
(30 points)
40, 96, 112, 24, 400, 500, 560, 32, 600, 48, 80.
- A 128 KB L1 cache has a 128 byte block size and is 2-way set-associative.
How many sets does the cache have? How many bits are used for the
offset, index, and tag, assuming that the CPU provides 32-bit addresses?
How large is the tag array? Please show your equations and steps.
(30 points)