CS 5968/6968 – Data Str & Alg Scalable CompLectures: MoWe / 11:50AM-01:10PM MT at GC 2760 Instructor: Prashant Pandey
Teaching Assistant:
We will use Piazza for all Q&A. Piazza CS 6968 Course OverviewThis course studies advanced data structures and algorithms for handling scalability challenges in large-scale data analysis and machine learning pipelines. It will cover modern hashing techniques, filters and sketching algorithms, locality sensitive-hashing, succinct data structures, string algorithms, graph algorithms, external memory algorithms, and learned indexes. This course is appropriate for both undergraduate and graduate students with intermediate data structure and algorithm skills. The course will also require intermediate programming skills in C/C++. PrerequisitesOfficial Prerequisites: For undergrads: CS 4150 (Undergrad algorithms) For grads: CS 6150 (Grad algorithms) Course Topics
Assignments
Projects
Paper ReadingThere is a set of assigned paper readings for the course. The reading list is designed to provide additional information and insight into the current state-of-the-art data structures and algorithms research. Each student is required to pick five papers from the reading list and turn in a one-paragraph synopsis of each of the five papers. There will be five deadlines throughout the semester when students would be required to submit the synopsis. Late submissions will not be accepted without prior approval from the instructor. Each review must include the following information:
These reading reviews must be your own writing. You may not copy from the papers or other sources that you find on the web. Plagiarism will not be tolerated. Scribing
Useful ResourcesPlease refer to this brief overview of asymptotic notations The Asymptotic Cheat Sheet. This will help you easily follow theoretical analyses in the course. Assignments, scribe notes, and final projects must be typeset in LaTeX. If you are not familiar with LaTeX, see this introduction. Here's a quick Overleaf tutorial. Grading
Late submission policy
Collaboration and PlagiarismEveryone needs to read the SoC Policy on Academic Misconduct. Working with others on assignment is a good way to learn the material and we encourage it. However, there are limits to the degree of cooperation that we will permit. When working on programming assignments, you must work only with others whose understanding of the material is approximately equal to yours. In this situation, working together to find a good approach for solving a programming problem is cooperation; listening while someone dictates a solution is cheating. You must limit collaboration to a high-level discussion of solution strategies, and stop short of actually writing down a group answer. Anything that you hand in, whether it is a paper report or a computer program, must be written in your own words. If you base your solution on any other written solution, you are cheating. If you collaborate with other students to discuss a problem and then write your own solution, make sure to declare upfront in the write up names of all the students you collaborated with. Never look at another student's code or share your code with any other student. You must not make your code public (on Github or by any other means). Tools like Github Copilot, ChatGPT, and copying code from sites like Stack Overflow also constitutes cheating. Do not write code with Copilot enabled in this course. We do not distinguish between cheaters who copy other's work and cheaters who allow their work to be copied. If you cheat, you will be given an E in the course and referred to the University Student Behavior Committee. Clearly, any attempt to subvert the ordinary grading process constitutes cheating. If you have any questions about what constitutes cheating, please ask first. |