# CS/ECE 3810: Computer Systems Architecture

Lecture 1: Introduction

Anton Burtsev September, 2022

#### Class details

- Undergraduate
  - 203 students
- Instructor: Anton Burtsev

#### Who am I?

- I build operating systems
  - Since I was your age, i.e., since 2000
- Bits of L4 microkernel, micro-ITRON, XenTT, LCDs, KSplit
  - https://www.cs.utah.edu/~aburtsev/

#### **Prospective students**

I am looking for students interested in operating systems at all levels from undergraduate to PhD, if you have relevant skills send me an email.

#### We are building three new operatings systems

**RedLeaf:** a clean-slate operating system in Rust designed to support formal verification of functional correctness (<u>project web page</u>).

**Redshift:** a new operating system aimed to support heterogeneous hardware, e.g., FPGAs, GPUs, TPUs, near storage, and near network cores, etc., as first class citizens (<u>project web page</u>).

**Horizon:** a new secure hypervisor and secure cloud in which users own their data. Horizon is developed in Rust, and relies on novel techniques of hardware and software isolation, and will implement cloud-wide information flow control (<u>project web page</u>).

#### Class details

- Undergraduate
  - 203 students
- Instructor: Anton Burtsev
- Meeting time: 11:50am-1:10pm (Mon/Wed)
- 4 TAs
  - Send us a private message on Piazza
- Web page
  - https://www.cs.utah.edu/~aburtsev/3810

#### More details

- 8-10 small homework assignments
- 1-2 lab-like assignments
  - Requires basics familiarity with UNIX
  - Shell, C
- Midterm
- Final
- Grades are curved
  - Homework: 30%, midterm exam: 30%, final exam: 40% of your grade.
  - You can drop two assignments
  - Late submissions are 0

#### This course

- Book: Hennessy and Patterson's
  - Computer Organization and Design

#### Topics

- Understanding performance/cost/power
- Assembly language
- Computer arithmetic
- Pipelining
- Using predictions
- Memory hierarchies
- Accelerators
- Reliability and Security





# Course organization

- Lectures
  - High level concepts and abstractions
  - Recorded (and hopefully live stream)
- Reading
  - Hennessy and Patterson
  - Bits of additional notes
- Homeworks
- Exams
  - Open notes (but might include open questions)

Questions?

#### Why this class?



```
main.rs - hello-rust - Visual Studio Code
                                                                               parser.rs
                                             ® main.rs
                                                             {} launch.json
        RUN AND DEBUG | Run
                                      >
                                             src > ® main.rs

∨ VARIABLES

                                                    use scu..process,
       Locals
                                               11

∨ args: { size=2 }
                                               12
                                                    mod fileworker;
           [size]: 2
                                               13
                                                    mod parser:
           [capacity]: 2
                                                    pub use fileworker::read file;
         > [0]: "D:\\CRIME\\hello-rust\\ta...
                                                    pub use parser::parser;
         > [1]: "src/clients.json"
                                               17
         > [Raw View]: {buf={ptr={pointer=...
                                                    type TokioError = std::io::Error;
          e: Variable is optimized away an...
留
          json data: Variable is optimized...
                                                    #[tokio::main]
                                               21
                                                    async fn main() -> Result<(), TokioError> {
                                               22
                                                         let args: Vec<String> = env::args().collect();
                                                         println!("arguments: {:?}", args);
                                           23
                                                         if args.len() != 2 {panic!("no file specified")};
\sum
                                                         let file name = &args[1];
                                                         let json data = match fileworker::read file(&file name).await {
                                                             Ok(data) => data,
                                                             Err(e) => error message(&file name, e),
                                                         println!("len = {}", json data.len());

∨ WATCH
                                                         println!("{}", json data);
                                                         parser::parser(&json data);
                                               32
                                                         0k(())
```

# But can you do it fast? ... or secure

# Example 1: Database Join

#### Main-Memory Hash Joins on Multi-Core CPUs: Tuning to the Underlying Hardware

```
Cagri Balkesen #1, Jens Teubner #2, Gustavo Alonso #3, M. Tamer Özsu *4

# Systems Group, Department of Computer Science, ETH Zurich, Switzerland

1, 2, 3 {name.surname}@inf.ethz.ch

* University of Waterloo, Canada

4 tamer.ozsu@uwaterloo.ca
```

 https://15721.courses.cs.cmu.edu/spring2016/p apers/balkesen-icde2013.pdf

# Example 1: Database Join



# Example 1: Database Join



Fig. 6. Original hash table implementation.

| 0 | ) { | 3 2     | 24      | 40 48 |
|---|-----|---------|---------|-------|
|   | hdr | tuple 1 | tuple 2 | next  |
|   |     |         |         |       |
|   |     |         |         |       |
|   |     |         |         |       |
| L |     |         |         |       |

Fig. 7. Our hash table implementation.

# **Example 2: Virtualization**

# Virtualization Without Direct Execution or Jitting: Designing a Portable Virtual Machine Infrastructure

Darek Mihocka
Emulators
darekm@emulators.com

Stanislav Shwartsman
Intel Corp.
stanislav.shwartsman@intel.com

 https://bochs.sourceforge.io/Virtualization\_With out\_Hardware\_Final.pdf

# **Example 2: Virtualization**

|       | 1000 MHz    | 2533 MHz  | 2666 MHz   |
|-------|-------------|-----------|------------|
|       | Pentium III | Pentium 4 | Core 2 Duo |
| Bochs | 882         | 595       | 180        |
| 2.3.5 |             |           |            |
| Bochs | 609         | 533       | 157        |
| 2.3.6 |             |           |            |
| Bochs | 457         | 236       | 81         |
| 2.3.7 |             |           |            |

Table 3.1: Windows XP boot time on different hosts

# Example 2: Virtualization

# 3.3 Host branch misprediction as biggest cause of slow emulation performance

Every pipelined processor features branch prediction logic used to predict whether a conditional branch in the instruction flow of a program is likely to be taken or not. Branch predictors are crucial in today's modern, superscalar processors for achieving high performance.

Modern CPU architectures implement a set of sophisticated branch predictions algorithms in order to achieve highest prediction rate, combining both static and dynamic prediction methods. When a branch instruction is executed, the branch history is stored inside the processor. Once branch history is available, the processor can predict branch outcome – whether the branch should be taken and the branch target.

#### A typical Bochs instruction handler method:

```
void BX CPU C::SUB EdGd(bxInstruction c *i)
  Bit32u op2 32, op1 32, diff 32;
  op2 32 = BX READ 32BIT REG(i->nnn());
  if (i->modC0()) { // reg/reg format
    op1 32 = BX READ 32BIT REG(i->rm());
    diff 32 = op1 32 - op2 32;
    BX WRITE 32BIT REGZ(i->rm(), diff 32);
                      // mem/reg format
  else {
    read RMW virtual dword(i->seg(),
        RMAddr(i), &op1 32);
    diff 32 = op1 32 - op2 32;
    Write RMW virtual dword(diff 32);
  SET LAZY FLAGS SUB32 (op1 32, op2 32,
        diff 32);
```

Listing 3.1: A typical Bochs instruction handler

Why is it so hard?

## Microprocessor Performance



50% improvement every year!! What contributes to this improvement?

What's inside a typical machine?

### **B360 AORUS Motherboard**



#### **CPU**

Xeon® E3-1200

- 1 CPU socket
  - 4 cores
  - 2 logical threads each (hyperthreads)

Hyper-Threading (logical threads)

Cores (4)

## Memory





## Memory abstraction

WRITE(addr, value)  $\rightarrow \varnothing$ 

Store *value* in the storage cell identified by *addr*.

 $READ(addr) \rightarrow value$ 

Return the *value* argument to the most recent WRITE call referencing *addr*.



#### Dell R830 4-socket server





Dell Poweredge R830 System Server with 2 sockets on the main floor and 2 sockets on the expansion

http://www.dell.com/support/manuals/us/en/19/poweredge-r830/r830\_om/supported-configur ations-for-the-poweredge-r830-system?guid=guid-01303b2b-f884-4435-b4e2-57bec2ce225a &lang=en-us

#### Multi-socket machines



#### Dell R830 4-socket server



Dell Poweredge R830 System Server with 2 sockets on the main floor and 2 sockets on the expansion



http://www.dell.com/support/manuals/us/en/19/poweredge-r830/r830\_om/supported-configur ations-for-the-poweredge-r830-system?guid=guid-01303b2b-f884-4435-b4e2-57bec2ce225a &lang=en-us

### Dell R830 Motherboard



What does CPU do internally?



# CPU execution loop

- CPU repeatedly reads instructions from memory
- Executes them
- Example

```
ADD EDX, EAX
// EDX = EAX + EDX
```



# A simple 5-stage pipeline



# Thank you!