CS/EE 3810

Assignment 8

Due: 10:45am, Tue Apr 2nd, 2024

Note: Make reasonable assumptions where necessary and clearly state them. Feel free to discuss problems with classmates, but the only written material that you may consult while writing your solutions are the textbook and lecture slides/videos. Solutions should be uploaded on Gradescope. Show your solution steps so you receive partial credit for incorrect answers and we know you have understood the material. Don't just show us the final answer.

Every homework has an automatic penalty-free 1.5 day extension to accommodate any covid/family-related disruptions. In other words, try to finish your homework by Tuesday 10:45am to keep up with the lecture content, but if necessary, you may take until Wednesday 11:59pm.

Consider an in-order 5-stage pipeline similar to the one discussed in class, e.g., see slides 3-9 of lecture 19. First assume that the pipeline does not support bypassing (forwarding). What are the stall cycles introduced between the following pairs of back-to-back instructions? Then, solve the same problem while assuming support for bypassing. Clearly show your work, i.e., show how each instruction goes through the 5 stages, indicate the point of production and point of consumption, show how the consuming instruction is held back in the D/R stage when there are stalls (similar to the example on slide 3 of lecture 19). Recall that a register read is performed in the second half of the D/R stage and a register write is performed in the first half of the RW stage. (60 points)
1. add $1, $2, $3
  add $4, $1, $2
2. lw $1, 8($2)
  add $4, $1, $3
3. lw $1, 8($2)
  sw $3, 8($1)
4. lw $1, 8($2)
  sw $1, 8($4)
Consider a program that executes a large number of instructions. Assume that the program does not suffer from stalls from data hazards or structural hazards. Assume that 16% of all instructions are branch instructions, and 20% of these branch instructions are Taken. What is the average CPI for this program when it executes on each of the processors listed below? All of these processors implement a 5-stage in-order pipeline and resolve a branch outcome at the end of the 2nd stage (similar to the 5-stage pipeline discussed in class). If it helps, assume that the program has 100 total instructions and would finish in 100 cycles (CPI = 1.0) if it encountered zero stall cycles. Then, figure out the stall cycles for each of the cases below, so for example, 10 stall cycles would equate to an execution time of 110 cycles and a CPI of 1.1. (40 points)
1. The processor pauses instruction fetch as soon as it fetches a branch. Instruction fetch is resumed after the branch outcome has been resolved.
2. The processor always fetches instructions sequentially, i.e., it predicts every branch as being Not-Taken. If a branch is resolved as Taken, the incorrectly fetched instructions after the branch are squashed.
3. The processor implements a branch delay slot. The compiler is able to fill the branch delay slot with an instruction that comes before the branch in the original code.
4. The processor does not implement branch delay slots. Instead, it implements a hardware branch predictor that makes correct predictions for 96% of all branches. When an incorrect prediction is discovered, the incorrectly fetched instructions after the branch are squashed.