Hari Sundar


Assignment 2 - Matrices & Graphs

Due Sep 14, 11:59pm ## Matrix-Vector and Matrix-Matrix Multiplications Consider an \(n \times m\) matrix \(A\) and a vector \(x\in \mathbb{R}^m\). We are interested in computing the matrix-vector product (matvec) \(b=Ax\). This is a pattern that will repeat itself several times in data analytics and therefore you should have an efficient implementation for this case. Also implement the Matrix-Matrix multiplication (matmult) of matrices \(A (m\times n)\) and \(B (n\times p)\). Both algorithms will be easy to implement using spark. Now recall from Assignment 1 and our analysis in class that it can be very inefficient to parallelize (using map-reduce) at the level of each \(i,j\) value. Modify both matvec as well as matmult to operate in blocks. The map and reduce operations (at the spark level) should be limited to the number of blocks. Internal block matvecs and matmults should be performed using python/scala/java. You can use libraries such as numpy or implement your own sequential versions. Simple Datasets are provided. Matrices and vectors of size 100, 200, 2000 and 10,000 are provided allowing combinations for both the matvec and matmult cases. Each text file has a single entry per line, containing i j m_ij for the matrix and simply x_i for the vector. You can generate additional examples yourselves. I am fine with you modifying the data format if that makes your job easier. In such cases, do submit the code to convert the standard format to your format. ## Shallow Graphs For a given undirected graph \(G = (V,E)\) with \(n\) vertices and \(m\) edges (\(m ≥ n\)), we say that \(G\) is shallow if for every pair of vertices \(u,v \in V\), there is a path from \(u\) to \(v\) of length at most 2 (i.e., using at most two edges). You will write a program that takes a graph as an input and determines whether the graph is shallow or not. Graphs can be easily represented as an Adjacency matrix \(A\), where \(A_{ij} = 1\) if an edge exists between vertices \(i\) and \(j\) and zero otherwise. Similarly, \(A^2_{ij}\) contains the number of paths of length 2 from node \(i\) to \(j\). Thus \((A^2 + A)_{ij}\) contains the number of paths of length at most 2 from \(i\) to \(j\). Use this property to implement your shallow graph detector. Can do device an algorithm to do better than computing \(A^2\)? Graph Datasets are available. The format is similar to the matrix, of this format, vertex_a vertex_b edge_wt ## Hints … Here are a few things to consider …. 1. Storing your matrix blocks as np.matrix and solving part 1 correctly will allow you to easily accomodate the sparse graphs by replacing np.matrix by a sparse matrix, for example from scipy.sparse 2. In order to keep your code generic, estimate the dimensions of the matrix after reading in the data. 3. Make sure you check your results against a sequential implementation. Here is my code, python def load_matrix(fname, sz): M = np.matrix(np.zeros(sz)) with open(fname) as f: content = f.readlines() for l in content: s = l.split() M[int(s[0]), int(s[1])] = float(s[2]) return M A = load_matrix("a2/mats/a_100x200.txt", (100, 200)) B = load_matrix("a2/mats/b_200x100.txt", (200, 100)) C = A*B print C 4. Experiment with different block sizes. Do you need to make changes to your code to accomodate arbitrary block sizes? 5. For the detection of shallow graphs consider whether it is easier to implement \(A^2+A\) or \(A(A+I)\), where \(I\) is the identity matrix. — Submit your code and a short report describing the choices you made while implementing the algorithms. Submissions should be done via canvas.