sorting

strategies
for
parallelization

divide & conquer

randomization

quicksort

sequential quicksort

\(\mathcal{O}(n\log n)\)


sequential quicksort

\(\mathcal{O}(n\log n)\)


sequential quicksort

\(\mathcal{O}(n\log n)\)


sequential quicksort

\(\mathcal{O}(n\log n)\)


sequential quicksort

\(\mathcal{O}(n\log n)\)


sequential quicksort

\(\mathcal{O}(n\log n)\)


Hyper-Quicksort

quicksort on a hypercube

HyperQuicksort

Wagar ’87


HyperQuicksort

Wagar ’87


HyperQuicksort

Wagar ’87


HyperQuicksort

Wagar ’87


HyperQuicksort

Wagar ’87



load (im)balance issues \(\rightarrow\) pick global median

\(\log p\) stages

data movement \(\mathcal{O}(n \log p)\)

HyperQuicksort

Communication Pattern


bucket sort

bucket sort


  • Assume input is uniformly distributed over an interval \([a,b]\)
  • Divide interval into \(m\) equal sized intervals (buckets)
  • Drop elements into appropriate buckets
  • Sort each bucket (say using quicksort)
  • \(\mathcal{O}(n + \frac{n}{m}\log\frac{n}{m}))\)
    • For \(m=\mathcal{O}(n)\rightarrow \mathcal{O}(n)\) sorting
  • Radix sort
    • dense, uniform distribution

samplesort

samplesort

Huang ’83


samplesort

Huang ’83


samplesort

Huang ’83


samplesort

Huang ’83


samplesort

Huang ’83


samplesort

Huang ’83


samplesort

communication pattern


samplesort


  • sort locally, sample evenly, \(p-1\) samples per task
  • load balance \(\rightarrow 2n/p\)
  • \(\mathcal{O}(p^2)\) samples are a bottleneck for large \(p\)
  • all2all communication \(\rightarrow\) \(n\) keys in \(p^2\) messages

samplesort


  • Sort locally
  • Select \(p−1\) splitters per process
  • Gather splitters at \(p_0\)
  • Sort splitters in \(p_0\)
  • Broadcast splitters
  • Sort locally

samplesort

load balance


guarantees no more than \(2n/p\) elements per bucket

Proof ?

Outline

  • compute lower and uppper bound number of splitters
  • relationship between number of splitters and number of elements
  • maximum load on any process - \(n - ub -lb\)

self-test questions


  • proove that the maximum load on any process using samplesort is \(2n/p\)
  • compare hyperquicksort and samplesort for different values of \(n\) and \(p\)
  • how would you select the median for hyperquicksort?
  • in general, how would you perform a \(k\)-Select?