

| Administrative                                                                        |         |
|---------------------------------------------------------------------------------------|---------|
| $\cdot$ Schedule for the rest of the semester                                         |         |
| - "Midterm Quiz" = long homework                                                      |         |
| - Return by Dec. 15                                                                   |         |
| - Projects                                                                            |         |
| - 1 page status report due TODAY                                                      |         |
| <ul> <li>handin cs4961 pstatus <file, ascii="" ok="" or="" pdf=""></file,></li> </ul> |         |
| - Poster session dry run (to see material) Dec. 8                                     |         |
| - Poster details (next slide)                                                         |         |
| • Mailing list: <u>cs4961@list.eng.utah.edu</u>                                       |         |
| 5                                                                                     |         |
|                                                                                       |         |
|                                                                                       |         |
|                                                                                       |         |
|                                                                                       |         |
|                                                                                       |         |
| 12/03/09                                                                              | UNIVER: |

### Poster Details

- I am providing:
- Foam core, tape, push pins, easels
- Plan on 2ft by 3ft or so of material (9-12 slides)
- Content:
  - Problem description and why it is important
  - Parallelization challenges
  - Parallel Algorithm
  - How are two programming models combined?
  - Performance results (speedup over sequential)
- Example

12/03/09

UNIVERSITY

### Outline

- Last New Topic: Transactional Memory
- General:
  - Where parallel hardware is headed
  - Where parallel software is headed
  - Parallel programming languages
- $\boldsymbol{\cdot}$  Sources for today's lecture
  - Transactional Coherence and Consistency, ASPLOS 2004, Stanford University
  - Vivek Sarkar, Rice University

12/03/09

UNIVERSITY OF LITAH

## Transactional Memory: Motivation

Multithreaded programming requires:

- Synchronization through barriers, condition variables, etc.
- Shared variable access control through locks  $\ldots$
- · Locks are inherently difficult to use

- Locking design must balance performance and correctness Coarse-grain locking: Lock contention Fine-grain locking: Extra overhead, more error-prone

- Must be careful to avoid deadlocks or races in locking
- Must not leave anything shared unprotected, or program may fail
- Parallel performance tuning is unintuitive - Performance bottlenecks appear through low level events Such as: false sharing, coherence misses, ...
- Is there a simpler model with good performance?

12/03/09









# <section-header><section-header><list-item><list-item><list-item><list-item><list-item><list-item><list-item><list-item>



### A Looming Software Crisis?

- Architectures are getting increasingly complex - Multiple cores, deep memory hierarchies, software
  - controlled storage, shared resources, SIMD compute engines, heterogeneity, ...
- Performance optimization is getting more important
  - Today's sequential and parallel applications *may not* be faster on tomorrow's architectures.
  - Especially if you want to add new capability!
  - Managing data locality even more important than parallelism.

### Complexity!

UNIVERSITY

UNIVERSITY

### Exascale Software Challenges

- Exascale architectures will be fundamentally different
  - Power management THE issue
  - Memory reduction to .01 bytes/flop
  - Hierarchical, heterogeneous

· Basic rethinking of the software "stack"

- Ability to express and manage locality and parallelism for ~billion threads will require fundamental change
- Support applications that are forward scalable and portable
- Managing power (although locality helps there) and resilience requirements

Sarkar, Harrod and Snavely, "Software Challenges in Extreme Scale Systems," ScIDAC 2009. Summary of results from a DARPA study entitled, "Exascale Software Study," (see http://users.ece.gatech.edu/%7Emrichard/ExascaleComputingStudyReports/ECS report ns," SciDAC 2009, June udy," (see

UNIVERSITY OF UTAH

# Motivation: Lessons at the Extreme End • HPC programmers are more willing than most to suffer to get good performance - But pain is growing with each new architecture - And application base is expanding (*e.g.,* dynamic, graph-based applications) Government funding inadequate to make these systems useable Therefore, best hope is to leverage commodity solutions - Also, an interesting and fertile area of research lies in this intersection

| Domain-specific                       | Domain-specific implicitly parallel programming models e.g.,<br>Matlab, stream processing, map-reduce (Sawzail),         |
|---------------------------------------|--------------------------------------------------------------------------------------------------------------------------|
| Programming Models                    | Parallelism in middleware e.g., transactions, relational<br>databases, web services, J2EE containers                     |
| Middleware                            | Parallel application libraries e.g., linear algebra, graphics<br>imaging, signal processing, security                    |
| Application Libraries                 | Parallel Debugging and Performance Tools e.g., Eclipse                                                                   |
| Programming Tools                     | Parallel Tools Platform, TotalView, Thread Checker<br>Explicitly parallel languages e.g., OpenMP, Java Concurrency,      |
| Languages                             | .NET Parallel Extensions, Intel TBB, CUDA, Cilk, MPI, Unified<br>Parallel C, Co-Array Fortran, X10, Chapel, Fortress     |
| Static & Dynamic Optimizing Compilers | Parallel intermediate representation, optimization of<br>synchronization & data transfer, automatic parallelization      |
| Multicore Back-ends                   | Code partitioning for accelerators, data transfer optimizations,<br>SIMDization, space-time scheduling, power management |
| arallel Runtime & System<br>Libraries | Parallel runtime and system libraries for task scheduling,<br>synchronization, parallel data structures                  |
| OS and Hypervisors                    | Virtualization, scalable management of heterogeneous resources per core (frequency, power)                               |

### Motivation: A Few Observations

- Overlap of requirements for petascale scientific computing and mainstream multi-core embedded and desktop computing.
- · Many new and "commodity" application domains are similar to scientific computing.
  - Communication, speech, graphics and games, some cognitive algorithms, biomedical informatics (& other "RMS" applications)
- Importance of work with real applications (who is your client?).
  - Biomedical imaging, Molecular dynamics simulation, Nuclear fusion, Computational chemistry, speech recognition, knowledge discovery ...

UNIVERSIT

### Where is compiler research going?

### Agen



| tion, Research Challenges, Education nda for the Compiler Community |  |  |  |  |  |
|---------------------------------------------------------------------|--|--|--|--|--|
|                                                                     |  |  |  |  |  |

Main research directions:

compiler com a broader coll between indu institutions, a from governm agencies, to a discussed her Enablers orative contr

- Make parallel programming mainstream
- Write compilers capable of self-improvement [autotuners]
- Performance models to support optimizations for parallel code - Enable development of software as reliable as an airplane
- Enable system software that is secure at all levels
- Verify the entire software stack

Hall, Padua and Pingali, "Compiler Research: The Next Fifty Years," CACM, Feb. 2009. Results of an NSF Workshop entitled, "The Future of Compiler Research and Education," held at USC/IS in Feb. 2007.

UNIVERSITY OF UTAH









# Future Directions: New Architectures Image: Strain Strai