Statistical Frontiers in Foundation Models and LLMs

STAT 992, UW - Madison, Department of Statistics, 2025

In this graduate seminar we will explore statistical frontiers in foundation models and large language models (LLMs).

Course Description

Statistical Frontiers in Foundation Models and LLMs explores the emerging intersection between modern statistical thinking and the capabilities of large-scale foundation models such as large language models (LLMs), vision-language models (VLMs), and diffusion models. While foundation models have achieved remarkable empirical success, they raise foundational questions about uncertainty, reliability, generalization, and inference—core concerns of the statistical sciences. This course examines how statistical tools and perspectives can help us rigorously evaluate, understand, and extend the capabilities of these models, and how in turn, foundation models may offer new tools for statisticians.

We begin by introducing foundation models from a statistical viewpoint, emphasizing why concepts such as calibration, entropy, and generalization remain central in the age of large-scale deep learning. Students will critically examine techniques for evaluating model reliability, including calibration accuracy, posterior consistency, and other statistical quantities. The seminar then turns to modern approaches for uncertainty quantification in generative models and LLMs, such as conformal prediction and deep ensembles, highlighting both the theoretical underpinnings and practical challenges of deploying these systems in safety-critical domains. Finally, we explore how foundation models can be used to perform statistical tasks themselves—such as assisting in Bayesian inference, simulation-based inference, and prediction-powered inference—marking a shift from models as objects of analysis to models as computational subroutines in statistical tasks.

Throughout the course, students will engage deeply with contemporary research papers at the frontier of statistics and machine learning, with an emphasis on developing the tools to evaluate and innovate in this rapidly evolving landscape.

Topics

Topic 1: What are Foundation Models and Why Should Statisticians Care?

Title: Should Statisticians Care About Foundation Models?

We set the stage for the seminar by defining what foundation models are (e.g., LLMs, VLMs, diffusion models), how they differ from traditional statistical models, and why their empirical power raises new theoretical and practical questions. We also discuss the role of statistical thinking in evaluating and repurposing these models for scientific inference, decision-making, and societal impact.

Potential Readings:

Topic 2: Statistical Evaluation of Foundation Models

Title: Measuring What Matters: Statistical Lenses on Model Behavior

Foundation models are often evaluated in terms of benchmarks or task accuracy—but what does it mean for them to be statistically reliable or trustworthy? This module explores evaluation through the lens of calibration, entropy, consistency, and statistical diagnostics. We treat models as black boxes and use classical statistical tools to probe the quality of their predictions and generations.

We might also read papers about watermarking. Watermarking straddles the line between statistical evaluation of foundation models and intervention: Statistical signals can be embedded into model outputs—intentionally or inadvertently. Such signals can be detected, analyzed, and potentially used for provenance, accountability, and robustness.

Potential Readings:


Topic 3: Statistical Methods for Uncertainty Quantification

Title: What Do You (Think You) Know? Quantifying Uncertainty in Foundation Models

When foundation models are deployed in high-stakes settings, uncertainty is as important as accuracy. This module focuses on statistical techniques for quantifying predictive uncertainty—including conformal prediction, deep ensembles, Laplace approximations, and attention-based uncertainty metrics. We discuss both theoretical guarantees and empirical behavior under distribution shift.

Potential Readings:

Topic 4: Foundation Models for Statistical Inference

Title: From Model to Oracle: Using Foundation Models in the Service of Inference

Beyond being objects of analysis, foundation models can act as tools in statistical workflows. This module explores how LLMs and generative models can assist in Bayesian inference, simulation-based inference (SBI), and prediction-powered inference. We consider the opportunities and pitfalls of using models as simulators, priors, or query engines in structured inference problems.

Potential Readings: