DAIR-3, Day 2: Building Robust ML Models

Date: August 5th, 2025, from 8:30-11:45AM
Instructor: Suraj Rampure (rampure@umich.edu)
Program Website

In this session, we will review the basics of predictive modeling and approaches to build an accurate and reproducible model, introduce best practices in reporting that will allow others to appropriately interpret and reproduce the results, and discuss guiding principles on how to reproduce others’ results.

Please fill out the Welcome Survey before the session begins.

There are four Python-based Jupyter Notebooks for this session.

Overview of Machine Learning and Tools (Jupyter Notebooks, sklearn, Logistic Regression, Train-Test Splits)
Dimensionality Reduction (Random Seeds, Stratification, PCA, MDS, t-SNE)
Model Selection (Cross-Validation, Regularization, Feature Standardization, Data Leakage)
Model Evaluation (Precision, Recall, ROC-AUC, More on Class Imbalance)

Don’t worry if you’re not familiar with Python – most of the code is already provided for you. Instead, focus on understanding the concepts and the code that’s provided.

There are two ways to access these notebooks. To get the most out of the workshop, I recommend you follow at least one of them, so that you can run code and experiment yourself.

Option 1: Web-Based Access

Click here to open the notebooks in your browser, without the need to install anything.

This link uses mybinder.org, a service that allows you to run Jupyter Notebooks in your browser. Some code may not work properly or take a long time to run, but this should suffice for the workshop.

Option 2: Local Installation

You can also clone our GitHub repository and run the notebooks locally. This option is suggested if you’ve used Jupyter Notebooks locally before, but if you haven’t, the web-based option is probably easier.

Find the repository here. There is a requirements.txt file in the repository that you can use to install the necessary dependencies.

In your Terminal, run the following commands:

git clone https://github.com/surajrampure/dair3-2025.git
cd dair3-2025
pip install -r requirements.txt
cd files

You can then open the notebooks in your browser by running jupyter notebook in your Terminal.

If you’d like more detailed steps on how to run Jupyter Notebooks locally, refer to this guide.