Large Population Models

How can we create real data-driven expert systems, that anyone can query to obtain uncertainty-callibrated answers to a wide range of questions about statistical populations? To that end we introduce Large Population Models, a framework containing two components: a generative model, implemented as a probabilistic program, that accurately and interpretably captures the original dataset's distribution, and a SQL-like query language for doing inference in that model, allowing millions of users worldwide to interpret and audit its results TODO link. In two case studies on clinical trials and US demographic data, we show that LPMs generate high quality synthetic data that can be used to substitute or enrich real data TODO link, produce callibrated uncertainty estimates TODO, and that they can compute conditional mutual information between variables and generate data that respects privacy constraints.

Data-driven expert systems based on LPMs

Differences with LLMs and neuro-symbolic LLM systems and Bayesian Statistics

US elections LPM

say some words about the kinds of queries each LPM will be capable of answering

- Check out US Elections

Clinical Trials LPM

say some words about it

- Check out Clinical Trials