HW4: Caches and Cache-CoherenceNote: Make reasonable assumptions where necessary and clearly state them. Feel free to discuss problems with classmates, but the only written material that you may consult while writing your solutions are the textbook, lecture notes, and lecture slides.
Memory access times
Consider a processor and a program that would have an IPC of 1 with a
perfect 1-cycle L1 cache. Assume that each additional cycle for cache/memory
access causes program execution time to increase by one cycle. Assume the
following MPKIs and latencies for the following caches:
Cache OrganizationA 64 MB L3 cache has a 128 byte line (block) size and is 32-way set-associative. How many sets does the cache have? How many bits are used for the offset, index, and tag, assuming that the CPU provides 48-bit addresses? How large is the tag array? (If you do not explain your steps, you will not receive partial credit for an incorrect answer.)Cache Miss RatesFor the following access pattern, (i) indicate if each access is a hit or miss. (ii) What is the hit rate? Assume that the cache has 2 sets and is 2-way set-associative. Assume that block A maps to set 0, B to set 1, C to set 0, D to set 1, E to set 0, F to set 1. Assume an LRU replacement policy.Does the hit rate improve if you assume a fully-associative cache of the same size (again, indicate if each access is a hit or a miss)? Access pattern: ABCBABECADBCAFDBCEA Virtually Indexed CacheAssume that the OS uses a page size of 8 KB. Assume that your L1 cache must be 4-way set-associative, and the cache line size is 64 bytes. If you're trying to correctly implement a virtually indexed physically tagged cache (with no additional support from the OS), what is the largest L1 cache that you can design?Snooping-Based Cache CoherenceConsider a symmetric shared-memory multiprocessor (3 processors sharing a bus) implementing a snooping cache coherence protocol such as the one discussed in class. For each of the events below, explain the coherence protocol steps (does the cache flag a hit/miss, what request is placed on the bus, who responds, is a writeback required, etc.) and mention the eventual state of the data block in the caches of each of the 3 processors. Assume that the caches are direct-mapped and that each cache line only stores one word and that words X and Y map to the same cache line in each cache (in other words, X and Y cannot both be in the cache at the same time). At the start, X and Y are not in any of the three caches.
P1: Write X
SubimtSubmit your solution through Gradescope HW4 (as a PDF file (please mark which parts of the PDF are used for each question (this can be done through Gradescope)). |
|||
Updated: March, 2019
|