sorting

mergesort

mergesort - example

mergesort


b = Merge_Sort(a,n)
if n < 100 
return seqSort(a, n);
b1 = Merge_Sort(a[0,…,n/2-1], n/2);
b2 = Merge_Sort(a[n/2,…,n-1], n/2);
return Merge (b1, b2);

work-optimal parallel merge


work-optimal parallel merge


partition \(B\) into blocks with \(\log m\) elements

work-optimal parallel merge


rank splitters of \(B\) in \(A\)

work-optimal parallel merge


merge blocks \(B_i\) and \(A_i\) sequentially

work-optimal parallel merge


  • partition \(B\) into \(m/\log m\) blocks, each with \(\log m\) elements
  • parallel for \(i=1:m/\log m\)
  • $r_i = \(`seq_rank`\)(b_{iK}: A)$
  • partition \(A\) accordingly
  • block \(A_i: (a_{r_{i-1}+1},\cdots,a_{r_i})\)
  • merge blocks of \(A\) and \(B\) sequentially in \(\mathcal{O}(\log n)\) time
  • but, if \(|A_i|\gg|B_i|=\log m\) then par_merge\((B_i, A_i)\)

basics

background


  • input specification
    • each process has \(n/p\) elements
    • an ordering of the processes is specified
  • output specification
    • each process will get \(n/p\) consecutive elements of the final sorted array
    • which chunk is determined by the process ordering

basic operation

compare-split


basic operation

compare-split


sorting networks


  • sorting is one of the fundamental problems in Computer Science„
  • for a long time researchers have focused on the problem of “how fast can we sort \(n\) elements”?
  • serial
    • \(\mathcal{O}(n\log n)\) lower-bound for comparison-based sorting
  • parallel
    • \(\mathcal{O}(1), \mathcal{O}(\log n), \mathcal{O}(???)\)
  • sorting networks
    • custom-made hardware for sorting!
      • hardware & algorithm
      • mostly of theoretical interest but fun to study!

elements of sorting networks


key idea

perform many comparisons in parallel


key elements

comparators and network topology

elements of sorting networks

bitonic sort


a sorting network with \(\mathcal{O}(\log^2n)\) columns

bitonic sequence


a bitonic sequence is a sequence of elements \((a_0,a_1,\ldots,a_{n-1})\) with the property that either (1) there exists an index \(i, 0\leq i\leq n-1\), such that \((a_0,\ldots,a_i)\) is monotonically increasing and \((a_{i+1},\ldots,a_{n-1})\) is monotonically decreasing, or (2) there exists a cyclic shift of indices so that (1) is satisfied.

why bitonic sequences?


a bitonic sequence can be easily sorted in increasing/decreasing order


Let \(s=(a_0,\ldots,a_{n-1})\) be a bitonic sequence such that \[a_0\leq a_1 \leq \cdots \leq a_{n/2-1}\] and \[a_{n/2}\geq a_{n/2+1} \geq \cdots \geq a_{n-1}.\] Consider the following subsequences of \(s\):

\[s_1 \leftarrow ( \min(a_0, a_{n/2}), \min(a_1, a_{n/2+1}), \ldots, \min(a_{n/2-1}, a_{n-1}) ) \]

\[s_2 \leftarrow ( \max(a_0, a_{n/2}), \max(a_1, a_{n/2+1}), \ldots, \max(a_{n/2-1}, a_{n-1}) ) \]

why bitonic sequences?


  • every element of \(s_1\) will be \(\leq\) every element of \(s_2\)
  • both \(s_1\) and \(s_2\) are bitonic sequences
  • so how can we sort bitonic sequences?

bitonic merging network

a comparator network that takes as an input a bitonic sequence and performs a sequence of bitonic splits to sort it

+BM[16] (increasing)

are we done?


  • given a set of elements, how do we re-arrange them into a bitonic sequence?
  • key idea

    • use successively larger bitonic networks to transform the set into a bitonic sequence

are we done?


  • given a set of elements, how do we re-arrange them into a bitonic sequence?
  • key idea

    • use successively larger bitonic networks to transform the set into a bitonic sequence

make bitonic

complexity


how many columns of comparators are required to sort \(n=2^k\) elements?

in other words the depth \(d(n)\) of the network?

\[ d(n) = d(n/2) + \log n = \mathcal{O}(\log^2 n)\]

bitonic sort

bitonic [b1, b2] = split (bitonic b) {
  n = length(b);
  for (i=0; i<n/2; ++i) {
    b1[i] = min( b[i], b[i+n/2] );
    b2[i] = max( b[i], b[i+n/2] );
  }
}

bitonic [b1, b2] = reverse_split (bitonic b) {
  [b2, b1] = split(b);
}

sequence s = sort_bitonic(bitonic b) {
  [b1, b2] = split(b);
  s = [sort_bitonic(b1), sort_bitonic(b2)];
}

sequence s = reverse_sort_bitonic(bitonic b) {
  [b1, b2] = reverse_split(b);
  s = [reverse_sort_bitonic(b1), reverse_sort_bitonic(b2)];
}

bitonic sort

bitonic b = make_bitonic(sequence a) {
  a1 = a[  0,...,n/2-1];
  a2 = a[n/2,...,n-1];
  
  b1 = make_bitonic(a1);
  b2 = make_bitonic(a2);
  
  b = merge_bitonic(b1,b2);
}

bitonic b = merge_bitonic(b1, b2) {
  b1 = sort_bitonic(b1);
  b2 = reverse_sort_bitonic(b2);
  b = [b1, b2];
}

bitonic sort on a hypercube


  • one element per process case
  • how do we map the algorithm onto a hypercube?
    • what is the comparator?
    • how do the wires get mapped?

bitonic sort on a hypercube

bitonic sort on a hypercube

bitonic sort on a hypercube

bitonic sort on a hypercube

communication pattern

communication characteristics of bitonic sort on a hypercube. during each stage of the algorithm, processes communicate along the dimensions shown.

bitonic sort on a hypercube

function y = hcube_bitonic_sort(p, id, x)
  d = log2(p);
  
  for i=0:d-1 
    for j=i:-1:0
      partner = flip_bit(id, j) % id xor 2^j
      if id.bit(i+1) == id.bit(j)
        comp_exchange_min (x, partner);
      else
        comp_exchange_max (x, partner);
      end
    end
  end
end    

more than one element per process


hypercube


\[ T_p = \underbrace{\mathcal{O}\left(\frac{n}{p}\log \frac{n}{p}\right)}_\text{local sort} + \underbrace{\mathcal{O}\left(\frac{n}{p}\log^2 p\right)}_\text{comparisons} + \underbrace{\mathcal{O}\left(\frac{n}{p}log^2 p\right)}_\text{communication} \]

self-test questions


  • how does the complexity of bitonic sort change when implemented on other topologies?
  • compare the number of comparisons and communication of mergesort with bitonic.
  • which architectures is bitonic sort best suited for?