Assignment 9
Due: 10:45am, Thu Apr 10th, 2025
Note: Make reasonable assumptions where necessary and clearly state them.
Feel free to discuss problems with classmates, but the only written material
that you may consult while writing your solutions are the textbook
and lecture slides/videos.
Solutions should be uploaded on Gradescope.
Show your solution steps so you receive partial credit for incorrect
answers and we know you have understood the material. Don't just show us the
final answer.
We require that answers be typed up and not hand-written.
Every homework has an automatic penalty-free 1.5 day extension to
accommodate any health/family-related disruptions. In other words, try to
finish your homework by Thursday 10:45am to keep up with the lecture
content, but if necessary, you may take until Friday 11:59pm.
- Consider a program that can execute with no stalls and a CPI of 1
if the underlying processor can somehow magically service every load
instruction with a 1-cycle L1 cache hit. In practice, 8% of all
load instructions suffer from an L1 cache miss, 5% of all load
instructions suffer from an L2 cache miss, and 2% of all load
instructions suffer from an L3 cache miss (and are serviced by the
memory system). Note that the above phrasing is different from Example
3 that we worked out in Lecture 22. An L1 cache miss stalls the
processor for 8 cycles
while the L2 is looked up. An L2 cache miss stalls the processor for
20 cycles while the L3 is looked up. An L3 cache miss stalls the
processor for an additional 200 cycles while data is fetched from memory.
What is the CPI for this program if 30% of the program's instructions
are load instructions? (30 points)
- Consider an L1 cache that has 8 sets, is direct-mapped (1-way), and
supports a block size of 64 bytes. How many bits of the address are
used to calculate the offset, index, and tag (assume that the CPU
generates 32-bit addresses)? Given an address, what are the equations
you will use to extract the offset, index, and tag bits from that
address? For the following memory
access pattern (shown as byte addresses), construct a table that
indicates the offset, index, and tag bits for each address (in either
decimal or binary). In the
table, also indicate if each access is a hit or a miss.
(50 points)
28, 80, 96, 552, 600, 4.
- A 128 KB L1 cache has a 64 byte block size and is 4-way set-associative.
How many sets does the cache have? How many bits are used for the
offset, index, and tag, assuming that the CPU provides 32-bit addresses?
How large is the tag array? Please show your equations and steps.
(20 points)