Videos: Optimization 2

Note: the second optional video is not included in the playlist.

For further optimizations, we need to understand pipelined architectures and how the processor detects and handles instruction-level parallelism. But we start with an aside on floating-point operations at the machine level, since we’ve only covered integer and address handling previously.

Machine FP 1: Arithmetic instructions 3:56
Introduction to SSE2 instructions (as typically used in x86-64) to support floating-point arithmetic.
Optional Machine FP 2: Vector operations 3:17
About SSE2 support for explicitly parallel, single-instruction multiple-data (SIMD) operations. We don’t use these in the rest of the videos, but it’s good to know that they exist.
Optional Machine FP 3: History of x87 through AVX2 6:04
In case you’re curious, SSE2 is one point in a series of floating-point and SIMD instruction sets.
Optimization 7: Superscalar processors 2:14
An introduction to processors that reorder and pipeline instruction sequences.
Optimization 8: Pipelined execution 3:22
A more specific investigation of how instructions are pipelined and how it can effect the overall time needed to perform a computation.
Optimization 9: Loop unrolling 4:26
Attempting to expose opportunities for parallelism to the compiler by “unrolling” a loop to handle multiple array elements in a single iteration.
Optimization 10: Reassociation 3:08
Making unrolling more successful by choosing to implement a different order of operation, which is acceptable fine for many applications.
Optimization 11: Throughput bounds 3:54
How the effect of unrolling can be limited by the amount of parallelism available from pipelining, and how additional functional units can help overcome those limits.
Optimization 12: Unrolling more 1:44
Testing and reaching the limits of unrolling to improve performance.
Optimization 13: Branch prediction 4:08
About branch prediction and how it helps enable instruction-level parallelism.

← prev up next →

	Schedule [subject to change]
	Course Staff
	Course Description
	Videos
	No Videos
	Labs

	Videos: Introduction
	Videos: C, Numbers, and Data Representation
	Videos: Machine Model: Instruction Set Architecture
	Videos: Representing Control Flow
	Videos: Representing Procedures
	Videos: Arrays
	Videos: Structs
	Videos: Optimization
	Videos: Optimization 2
	Videos: Cache
	Videos: Linking
	Videos: ELF
	Videos: Processes
	Videos: More on Processes
	Videos: File Descriptors
	Videos: Signals
	Videos: Virtual Memory
	Videos: Memory Allocation
	Videos: More on Memory Allocation
	Videos: Garbage Collection
	Videos: Networking
	Videos: More Networking
	Videos: Concurrency
	Videos: Synchronization