CS 6230: Parallel Computing and High Performance Computing

Midterm Exam

Due Mar 31, 11:59pm

Please submit a single pdf named uid_midterm.pdf. Let the uid start with a lowercase u followed by 7 digits. Submit/upload via canvas. All problems are worth 10 points.

You are given a one dimensional array that may contain both positive and negative integers, develop a parallel algorithm to find the sum of contiguous subarray of numbers which has the largest sum. For example, if the given array is \([-2, -5, \underline{6}, \underline{-2}, \underline{-3}, \underline{1}, \underline{5}, -6]\), then the maximum subarray sum is 7 (underlined numbers).
Suppose we are given a set of \(n\) elements stored in an array \(A\) together with an array \(L\) suchthat \(L(i)\in \{1,2,\ldots, k\}\) represents the label of element \(A(i)\), where \(k\) is a constant. Develop an optimal \(\mathcal{O}(\log n)\) time EREW PRAM algorithm that stores all the elements of \(A\) with label \(1\) into the upper part of \(A\) while preserving their initial ordering, followed by the elements labeled \(2\) with the same initial ordering and so on. For example, \[ A = [6, 5, 3, 9, 11, 12, 8, 17, 21, 2]\\ L=[1,1,2,3,2,1,1,2,3,3]\\ \] produces \[ A=[6,5,12,8,3,11,17,9,21,2].\]
Develop a work and depth optimal parallel algorithm for computing the first \(n\) Fibonacci numbers.
Suppose an \(n\times n\) matrix is embedded in a hypercube (we assume that \(n\) is a power of 2). Find an algorithm for transposing this matrix in \(\mathcal{O}(\log n)\) time. (Hint: \(2^k = n^2 = 2^{2q}\))
Let \(p(x) = a_0x^n + a_1x^{n-1} + \cdots + a_{n-1}x + a_n\) be a given polynomial. Horner’s algorithm to compute \(p(x)\) at a point \(x_0\) is based on rewriting the expression for p(x_0) as follows: \[ p(x_0) = (\cdots((a_0x_0 + a_1)x_0 + a_2)x_0 + \cdots + a_{n-1})x_0 + a_n. \] An obvious sequential algorithm for this problem has \(\mathcal{O}(n)\) complexity. Is it possible to develop a work optimal parallel algorithm whose complexity is \(\mathcal{O}(n/p + \log n)\) for \(p\) processors. Give pseudocode for the best possible parallel algorithm.