Hari Sundar


Assignment 0 - Focused Web Crawling

Due Aug 31 Please direct your questions to the TA about this assignment. ## Install Spark on your Laptop/Workstation Install Spark on your personal machines or any machine that you have easy access to. While you can use the elephant cluster for development, it is going to be much more convenient developing using a local install. We will only run the large problems on Elephant, i.e., Assignments 3,4 and your Final Project. For Assignments 3 and 4, I will provide you with smaller test datasets that will allow you to develop using your personal machines. The Spark version installed on the elephant cluster is 1.2.1. It is best if everyone installs Spark 1.2.1 on their personal machines. Here are some online links to install Spark on a Mac, Linux and on Windows. You can find other tutorials for installation online. Only email the TA if you have tried hard and are unable to get it to work. Make sure that you are able to run the examples on Spark QuickStart. I will be using Python during examples shown in class. You are free to use Java, Scala or Python. Just make sure you are able to run the standard examples. ## Submission For the submission you only need to email the TA confirming that you installed Spark and are able to run the simple examples. Also mention what language (Scala/Python/Java) you used and intend to use for the subsequent assignments. You will be penalized 5% for not completing this assignment on time.