## Lecture 1: CS/ECE 3810 Introduction

- Today's topics:
  - Why computer organization is important
  - Logistics
  - Modern trends

(REGRO)

## Why Computer Organization





### Why Computer Organization



## Why Computer Organization

- Embarrassing if you are a BS in CS/CE and can't make sense of the following terms: DRAM, pipelining, cache hierarchies, I/O, virtual memory, ...
- Embarrassing if you are a BS in CS/CE and can't decide which processor to buy: 4.4 GHz Intel Core i9 or 4.7 GHz AMD Ryzen 9 (reason about performance/power)
- Obvious first step for chip designers, compiler/OS writers
- Will knowledge of the hardware help you write better and more secure programs?

## Must a Programmer Care About Hardware?

- Must know how to reason about program performance and energy and security
- Memory management: if we understand how/where data is placed, we can help ensure that relevant data is nearby
- Thread management: if we understand how threads interact, we can write smarter multi-threaded programs
  - $\rightarrow$  Why do we care about multi-threaded programs?

200x speedup for matrix vector multiplication

- Data level parallelism: 3.8x
- Loop unrolling and out-of-order execution: 2.3x
- Cache blocking: 2.5x
- Thread level parallelism: 14x

Further, can use accelerators to get an additional 100x.

# **Key Topics**

- Moore's Law, power wall
- Use of abstractions
- Assembly language —> C, Java
- Computer arithmetic
- Pipelining
- Using predictions
- Memory hierarchies
- Accelerators
- Reliability and <u>Security</u>



#### Logistics

- See class web-page for syllabus/resources https://www.cs.utah.edu/~rajeev/cs3810
- COVID reminders follow university guidelines
- TAs and office hours: TBA
- Most communication on Canvas; email me directly to set up meetings, or meet me in office hours right after class Mere 3414
- Textbook: Computer Organization HW/SW Interface, Patterson and Hennessy, 5<sup>th</sup> or 6<sup>th</sup> edition

The 9:10 an Wed 11:59 pm

- 30% midterm, 40% final, 30% assignments
- ~10 assignments you may skip two; automatic 1.5 day extension until Wed/Fri late night; upload on Canvas
- Co-operation policy: you may discuss you may not see someone else's written matter when writing your solution
- Exams are open-notes (3 pages) × 2
- Print slides just before class
- Screencast YouTube videos



- Grading by rank
- No tolerance for cheating (see class webpage)
- Rank in exams matters more than rank in homeworks
- Historically, 90% of students receive grades of B- or higher 250 the  $18^{-11} - A$  (top 45) 45 91.37  $18^{-11} - A$  (top 45) 45 91.37  $47^{-11} 90.83 + M + 47^{-11} 90.83 + M + 47^{-11} 90.83 + M + 13^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-11} 12^{-$

### **Microprocessor Performance**



-> 1.5 -> 2.25

Why the lower improvement?

# **Microprocessor Performance**



42 Years of Microprocessor Trend Data

Original data up to the year 2010 collected and plotted by M. Horowitz, F. Labonte, O. Shacham, K. Olukotun, L. Hammond, and C. Batten New plot and data collected for 2010-2017 by K. Rupp

Source: karlrupp.net

Perf = IPC xfreg





- Two roadblocks: power and ideas
- Fixed power budget because of cooling constraints; implies that frequency can't be increased; discourages complex ideas
- End of voltage (Dennard) scaling in early 2010s
- Has led to dark silicon and dim silicon (occasional turbo)



- Running out of ideas to improve single thread performance
- Power wall makes it harder to add complex features
- Power wall makes it harder to increase frequency
- Additional performance provided by: more cores, occasional spikes in frequency, accelerators

• Historical contributions to performance:

- 1. Better processes (faster devices) ~20%
- 2. Better circuits/pipelines ~15%
- 3. Better organization/architecture ~15%

In the future, bullet-2 will help little and bullet-1 will eventually disappear!

|             | Pentium                                                                            | P-Pro | P-II | P-III | P-4   | Itanium | Montecito |
|-------------|------------------------------------------------------------------------------------|-------|------|-------|-------|---------|-----------|
| Year        | 1993                                                                               | 95    | 97   | 99    | 2000  | 2002    | 2005      |
| Transistors | 3.1M                                                                               | 5.5M  | 7.5M | 9.5M  | 42M   | 300M    | 1720M     |
| Clock Speed | 60M                                                                                | 200M  | 300M | 500M  | 1500M | 800M    | 1800M     |
| Moore's Law | At this point, adding transistors<br>to a core yields little benefit <sup>16</sup> |       |      |       |       |         |           |

# What Does This Mean to a Programmer?

- Today, one can expect only a 20% annual improvement; the improvement is even lower if the program is not multi-threaded
  - A program needs many threads
  - The threads need efficient synchronization and communication
  - Data placement in the memory hierarchy is important
  - Accelerators should be used when possible

# **Challenges for Hardware Designers**

- Find efficient ways to
  - improve single-thread performance and energy
  - improve data sharing
  - boost programmer productivity
  - manage the memory system
  - build accelerators for important kernels
  - provide security



- Topics: Trends, Performance, MIPS instruction set architecture (Chapter 2)
- Visit the class web-page https://www.cs.utah.edu/~rajeev/cs3810