sorting

mergesort

mergesort - example

mergesort

b = Merge_Sort(a,n)
if n < 100 
return seqSort(a, n);
b1 = Merge_Sort(a[0,…,n/2-1], n/2);
b2 = Merge_Sort(a[n/2,…,n-1], n/2);
return Merge (b1, b2);

work-optimal parallel merge

partition $B$ into blocks with $\log m$ elements

work-optimal parallel merge

rank splitters of $B$ in $A$

work-optimal parallel merge

merge blocks $B_i$ and $A_i$ sequentially

work-optimal parallel merge

partition $B$ into $m/\log m$ blocks, each with $\log m$ elements
parallel for $i=1:m/\log m$
$r_i = $`seq_rank`$(b_{iK}: A)$
partition $A$ accordingly
block $A_i: (a_{r_{i-1}+1},\cdots,a_{r_i})$
merge blocks of $A$ and $B$ sequentially in $\mathcal{O}(\log n)$ time
but, if $|A_i|\gg|B_i|=\log m$ then par_merge$(B_i, A_i)$

basics

background

input specification
- each process has $n/p$ elements
- an ordering of the processes is specified
output specification
- each process will get $n/p$ consecutive elements of the final sorted array
- which chunk is determined by the process ordering

basic operation

compare-split

basic operation

compare-split

sorting networks

sorting is one of the fundamental problems in Computer Science
for a long time researchers have focused on the problem of “how fast can we sort $n$ elements”?
serial
- $\mathcal{O}(n\log n)$ lower-bound for comparison-based sorting
parallel
- $\mathcal{O}(1), \mathcal{O}(\log n), \mathcal{O}(???)$
sorting networks
- custom-made hardware for sorting!
  - hardware & algorithm
  - mostly of theoretical interest but fun to study!

elements of sorting networks

key idea

perform many comparisons in parallel

key elements

comparators and network topology

elements of sorting networks

bitonic sort

a sorting network with $\mathcal{O}(\log^2n)$ columns

bitonic sequence

a bitonic sequence is a sequence of elements $(a_0,a_1,\ldots,a_{n-1})$ with the property that either (1) there exists an index $i, 0\leq i\leq n-1$, such that $(a_0,\ldots,a_i)$ is monotonically increasing and $(a_{i+1},\ldots,a_{n-1})$ is monotonically decreasing, or (2) there exists a cyclic shift of indices so that (1) is satisfied.

why bitonic sequences?

a bitonic sequence can be easily sorted in increasing/decreasing order

Let $s=(a_0,\ldots,a_{n-1})$ be a bitonic sequence such that \[a_0\leq a_1 \leq \cdots \leq a_{n/2-1}\] and \[a_{n/2}\geq a_{n/2+1} \geq \cdots \geq a_{n-1}.\] Consider the following subsequences of $s$:

\[s_1 \leftarrow ( \min(a_0, a_{n/2}), \min(a_1, a_{n/2+1}), \ldots, \min(a_{n/2-1}, a_{n-1}) ) \]

\[s_2 \leftarrow ( \max(a_0, a_{n/2}), \max(a_1, a_{n/2+1}), \ldots, \max(a_{n/2-1}, a_{n-1}) ) \]

why bitonic sequences?

every element of $s_1$ will be $\leq$ every element of $s_2$
both $s_1$ and $s_2$ are bitonic sequences
so how can we sort bitonic sequences?

bitonic merging network

a comparator network that takes as an input a bitonic sequence and performs a sequence of bitonic splits to sort it

+BM[16] (increasing)

are we done?

given a set of elements, how do we re-arrange them into a bitonic sequence?
key idea
- use successively larger bitonic networks to transform the set into a bitonic sequence

are we done?

given a set of elements, how do we re-arrange them into a bitonic sequence?
key idea
- use successively larger bitonic networks to transform the set into a bitonic sequence

make bitonic

complexity

how many columns of comparators are required to sort $n=2^k$ elements?

in other words the depth $d(n)$ of the network?

\[ d(n) = d(n/2) + \log n = \mathcal{O}(\log^2 n)\]

bitonic sort

bitonic [b1, b2] = split (bitonic b) {
  n = length(b);
  for (i=0; i<n/2; ++i) {
    b1[i] = min( b[i], b[i+n/2] );
    b2[i] = max( b[i], b[i+n/2] );
  }
}

bitonic [b1, b2] = reverse_split (bitonic b) {
  [b2, b1] = split(b);
}

sequence s = sort_bitonic(bitonic b) {
  [b1, b2] = split(b);
  s = [sort_bitonic(b1), sort_bitonic(b2)];
}

sequence s = reverse_sort_bitonic(bitonic b) {
  [b1, b2] = reverse_split(b);
  s = [reverse_sort_bitonic(b1), reverse_sort_bitonic(b2)];
}

bitonic sort

bitonic b = make_bitonic(sequence a) {
  a1 = a[  0,...,n/2-1];
  a2 = a[n/2,...,n-1];
  
  b1 = make_bitonic(a1);
  b2 = make_bitonic(a2);
  
  b = merge_bitonic(b1,b2);
}

bitonic b = merge_bitonic(b1, b2) {
  b1 = sort_bitonic(b1);
  b2 = reverse_sort_bitonic(b2);
  b = [b1, b2];
}

bitonic sort on a hypercube

one element per process case
how do we map the algorithm onto a hypercube?
- what is the comparator?
- how do the wires get mapped?

bitonic sort on a hypercube

communication pattern

communication characteristics of bitonic sort on a hypercube. during each stage of the algorithm, processes communicate along the dimensions shown.

bitonic sort on a hypercube

function y = hcube_bitonic_sort(p, id, x)
  d = log2(p);
  
  for i=0:d-1 
    for j=i:-1:0
      partner = flip_bit(id, j) % id xor 2^j
      if id.bit(i+1) == id.bit(j)
        comp_exchange_min (x, partner);
      else
        comp_exchange_max (x, partner);
      end
    end
  end
end

more than one element per process

hypercube

\[ T_p = \underbrace{\mathcal{O}\left(\frac{n}{p}\log \frac{n}{p}\right)}_\text{local sort} + \underbrace{\mathcal{O}\left(\frac{n}{p}\log^2 p\right)}_\text{comparisons} + \underbrace{\mathcal{O}\left(\frac{n}{p}log^2 p\right)}_\text{communication} \]

self-test questions

how does the complexity of bitonic sort change when implemented on other topologies?
compare the number of comparisons and communication of mergesort with bitonic.
which architectures is bitonic sort best suited for?

sorting

mergesort

mergesort - example

mergesort

work-optimal parallel merge

work-optimal parallel merge

partition \(B\) into blocks with \(\log m\) elements

work-optimal parallel merge

rank splitters of \(B\) in \(A\)

work-optimal parallel merge

merge blocks \(B_i\) and \(A_i\) sequentially

work-optimal parallel merge

basics

background

basic operation

compare-split

basic operation

compare-split

sorting networks

elements of sorting networks

key idea

key elements

elements of sorting networks

bitonic sort

a sorting network with \(\mathcal{O}(\log^2n)\) columns

bitonic sequence

why bitonic sequences?

a bitonic sequence can be easily sorted in increasing/decreasing order

why bitonic sequences?

bitonic merging network

+BM[16] (increasing)

are we done?

are we done?

make bitonic

complexity

how many columns of comparators are required to sort \(n=2^k\) elements?

bitonic sort

bitonic sort

bitonic sort on a hypercube

bitonic sort on a hypercube

bitonic sort on a hypercube

bitonic sort on a hypercube

bitonic sort on a hypercube

communication pattern

bitonic sort on a hypercube

more than one element per process

hypercube

self-test questions