The class project aims to extend or practice your knowledge/skills learned from the class. The project is worth 30% of the class grade. You can choose two types of the class project, described as follows.


1. Reading Project

You can select three recently published probabilistic learning papers to read, make sure to fully understand the papers and submit paper summaries as the report. The project must be finished individually. The papers must be published in top machine learning conferences, including NeurIPS, ICML, AISTATS, UAI and ICLR, and no earlier than 2005. However, it is fine if you want to read journal papers. Please choose them from JMLR, TPAMI, JASA, Annals of Statistics, JRSB, and Bayesian Analysis.

Grading and Milestones

The grading is broken into the following milestones:

  1. Paper choices (10 points): Notify the instructor the papers you want to read. Note that if the papers do not fit the probabilistic learning topic, you will be asked to select new papers.

  2. Mid-term report  (30 points): Submit the summary of one paper.

  3. Final report  (60 points): Submit the summary of the remaining two papers.

  4. Each paper summary should have at least 2 full pages (size 11 font). The summary should include the following information: Note that (1) a summary with less than 2 full pages will NOT receive any credit, and (2) you cannot copy sentences from the original paper. If we find such sentences, the summary will not receive any credit.

2. Research Project

You can use probabilistic learning approaches to explore/address some research task. You can form a group for the project. Each project group consist of at most two students.

Grading and Milestones

Please create a Github repostiory to update and matain your project. The grading is broken into the following milestones:

  1. Project team (5 points): Notify the instructor the members of your project group.

  2. Mid-term report  (30 points): Submit at most 5 page description about your project (size 11 font). The description should include the following information:
    • A brief introduction to the problem you want to solve using probabilistic learning techniques.(10%)
    • The motivation - why do you want to use learning techniques? Why not the traditional or existing methods? (10%)
    • What you have done to reach your goal. Note that just “We collected data” will NOT be enough (40%)
    • What is your detailed plan for the rest of the project (30%)
    • Reference to literature (10%)
  3. Final report (65 points): The length of the final report is up to six pages (size 11 font), which should be structured as a small research paper. It should consist of the following content:
    1. Problem definition and motivation - what problem did you choose? Why is it important or interesting? Why did you use machine learning techniques to solve it? (20 points)
    2. Your solution - the details of the machine learning models/algorithms you chose/developed (or proofs for theoretical projects) (20 points).
    3. Experimental evaluation (20 points)
    4. Future plan (5 points)

    For theoretical project, the solution and experimental evaluation will be graded as one component, with 40 points. Note: the final report must include a Github repository that links to your implementation of your project. We will check your implmentaiton as well. Missing the Github link will lead to zero grade of the final report.

Topics

Any project using machine learning as a critical step or component will be fine.

If you are looking for ideas of possible projects, come to the office hours and we can brainstorm ideas. Projects can be one of:

  1. An application project, e.g., some machine learning application that you feel interesting.

  2. Reproduction of published results, e.g., you are interested in one machine learning paper and want to reimplement their model/algorithm to reproduce their experimental results.

  3. A theoretical project, e.g., prove interesting properties of a learning algorithm.

  4. An algorithmic project, e.g., develop a new learning algorithm for a particular type of problem.

  5. Your own research, e.g., if you are already working on some project and wish to apply machine learning methods.

In general, choose topics that you feel exciting, and convince me that the topic is important/interesting.

Important: Experimental evaluations should be rigorous, i.e., choose fair baselines, apply cross-validation for hyper-parameter selection, and report both positive and negative results.

Project Examples

  • Kaggle competition tasks
  • Biology and medical study: can we select genes relevant to some disease, such as breast cancer or Alzheimer's Disease?
  • Stock market prediction: can we predict the trend of the stock price (going up or down)?
  • Commodity recommendation: can we use customer's purchase records to recommend commodities to old and new customers?
  • Software and security: can we identify Android apps with malwares?
  • Sentiment analysis: can we classify whether a piece of comment is positive or negative?
  • Spam emails detection.
  • ...