Hari Sundar

CS 6230: Parallel and High Performance Computing

Office Hours: Wed 1-3pm MEB 3454

TA: Nishith Tirpankar

This course is structured to train students to reason about the design and implementation of efficient parallel programs. The focus areas of the class will be on the modeling, implementation and evaluation of distributed, message-passing interface (MPI) based programs, shared-memory thread-based OpenMP programs, and hybrid (MPI/OpenMP) programs. Almost all examples will be aligned with numerical scientific computing. This course is appropriate for those that want transform serial algorithms into parallel algorithms, want to modify currently existing parallel algorithms, or want to write parallel algorithms from scratch.

This course will be in two parts.

Part A: Parallel Algorithms & Programming (MPI, OpenMP)

In the first part (roughly until spring break), we will focus on the fundamentals of developing parallel & distributed algorithms as well as implementing parallel algorithms using MPI and OpenMP. In the lectures we will study parallel programing models and analysis of parallel algorithms as well as fundamental parallel data structures and algorithms. In parallel you will develop your parallel programming skills by regular prorgamming tutorials and assignments. These will be online tutorials explaining key parallel programming concepts along with simple programming tests. This part will have a stronger CS focus, but I will not make strong assumptions on what you know besides some programming experience, preferably C/C++/Fortran. Here are the high level topics that we will cover,

  1. Parallel Programming Models a. SharedMemory (Work/Depth, PRAM), APIs (OpenMP) b. DistributedMemory (message passing), APIs (MPI)
  2. Discrete Algorithms a. Sorting, Searching, Selecting, Merging b. Trees, Graphs (coloring, partitioning, traversal)

Part B: Parallel Numerical Methods

The second half of this course will focus more strongly on large-scale computational algorithms and techniques for developing scalable computational codes. Prior experience with numerical linear algebra is expected. Our focus will be on the parallelizability and scalability of several numerical algorithms. The main topics that will be covered are,

  1. Dense/SparseLinear Algebra (LU, SOR, Krylov, Nested Dissection)
  2. Parallel Fast Fourier Transform
  3. Parallelizing Finite Difference / Finite Elements
    • Adaptive Mesh Refinement
  4. Iterative solvers
    • Multigrid methods
  5. $n$-body problems, Boundary Integral Equations
    • FastMultipole Method
    • HierarchicalMatrices

The detailed schedule is available on Canvas.

Prerequisites & Expectations

While there are no formal prerequisites, basic understanding of linear algebra, sequential algorithms and data structures is expected. Additionally, reasonable programming skills (preferably C/C++ or Fortran) and familiarity with Linux or Unix is also expected. The assignments/projects will be parallelized using OpenMP and MPI (Message Passing Interface). We will not cover GPU programming, consider CS 6235 instead.

Adherence to the CoE and SoC academic guidelines is expected. Please read the following.


There will be several small programming assignments (~10) in the first half of the semester. These will be roughly 30 minutes of reading and 30 minutes of programming. You will be awarded 60% marks for correctness of your code and the remaining 40% will be based on performance (runtime and scalability) compared to other students. There will also be 2 theoretical assignments that test your ability to analyze parallel algorithms. There will be a take-home midterm exam and a final project (typically implementing a parallel numerical algorithm). A tentative grade division is listed below.

class participation              10% 

programming assignments          25% 

theoretical assignments          15% 

midterm                          20% 

final project                    30% 


We shall be using the Kingspeak cluster for the assignments and projects. Kingspeak is managed by CHPC at the U. Kingspeak (our share) has 12 nodes each with 28 cores and 128GB of RAM. Please fill out this form as soon as possible to get access. I would also recommend that everyone install MPI locally on their machines. On Linux and Mac you can install either MPICH or OpenMPI. On windows, it is preferable to use Microsoft MPI.

Books and reading materials

I shall keep adding reading materials during the semester.