Megan Richards

About Me

I’m a Computer Science PhD student at NYU’s Courant Institute for Mathematical Sciences, working with Kyunghyun Cho. My PhD is supported by NYU’s Dean’s Doctoral Fellowship, and the NSF Graduate Research Fellowship (GRFP). Previously, I was an AI Resident at Meta AI (FAIR labs) where I was fortunate to be advised by Mark Ibrahim and Diane Bouchacourt. I graduated with departmental distinction from Duke University, where I studied Electrical and Computer Engineering with a concentration in Machine Learning.

My work at NYU focuses on building more scalable, general-purpose statistical tools for problems in causal inference and information theory.

At FAIR, I studied the reliability of vision-language models (during which I became convinced that we needed better tools to measure datasets). My work at FAIR included included studying the increasing divergence between imagenet-based benchmarks and global, crowdsourced data (ICLR 2024), investigating why vision models failed to generalize to images from non-western countries (Spotlight, NeurIPS 2023), and designing metrics for image generation (Outstanding Paper, TiFA workshop ICML 2024). I recently shared an overview of this line of work at CVPR’s DemoDiv workshop (slides).

I became motivated to work in machine learning research after experiences building models in healthcare settings. At Duke, I worked with Mark Sendak as part of the Duke Institute for Healthcare Innovation (DIHI), to build risk prediction models for severe pregnancy complications. While at Duke, I also worked at the Duke Center for Global Women’s Health Technologies on a self-screening device for cervical cancer designed for low-resource global settings, which earned a Best Research award at NIH’s IEEE HIPOCT Conference in 2019.

Recent Updates

June ‘25 📝 I’ll be at ICML in Seoul presenting two workshop papers! I’ll be at SPIGM presenting work meta-learning mutual information estimators (full version published at TMLR in May), and then at FMSD, I’ll be presenting work benchmarking foundation models for synthetic control (a causal inference problem based on observational time-series data).
May ‘25 📝 Our most recent work, MIST: Mutual Information via Supervised Traning, is now published in TMLR! We meta-learn mutual information estimators, building estimators that learn to estimate MI directly from samples (rather than computing density or density ratios through bounds). Our method achieves substantial gains for high-dimensional, low-sample settings, with orders-of-magnitude improvements in inference efficiency.
June ‘25: Delighted to be speaking at CVPR’s DemoDiv workshop about some of our work studying geographic underrepresentation in computer vision. I made my presentation publicly available here!
Nov ‘24: 📝 Check out our new NAACL ‘25 work On the Role of Speech Data in Reducing Toxicity Detection Bias, led by Samuel Bell! We generate and release a new set of multilingual toxicity annotations for MuTox, and find that when models have access to the audio itself, rather than a transcript, they are more accurate and less biased in detecting toxicity (w.r.t group mentions).

Publications

MIST: Mutual Information via Supervised Training
German Gritsai^*, Megan Richards^*,, Maxime Méloux^*,, Kyunghyun Cho, Maxime Peyrard
^*joint first author
[Transactions on Machine Learning Research (TMLR)]
On the Role of Speech Data in Reducing Toxicity Detection Bias
Samuel J. Bell^*, Mariano Coria Megliol^*, Megan Richards^*, Eduardo Sánchez^*,
Christophe Ropers, Skyler Wang, Adina Williams, Levent Sagun, Marta R. Costa-jussà^*
^*core contributor
NAACL 2025
[ArXiv]
Decomposed Evaluations of Geographic Disparities in Text-To-Image Models
Abhishek Sureddy^*, Dishant Padalia^*, Nandhinee Periyakaruppa^*, Oindrila Saha, Adina Williams, Adriana Romero-Soriano,
Megan Richards^**, Polina Kirichenko^**, Melissa Hall^**
^*joint first author ^**joint senior author
Outstanding Paper, Trustworthy Multi-modal Foundation Models and AI Agents (TiFA) Workshop, ICML 2024.
Next Generation of AI Safety Workshop, ICML 2024.
[ArXiv]
An Introduction to Vision-Language Modeling
Florian Bordes, Richard Yuanzhe Pang, Anurag Ajay, …, Megan Richards, …, Kate Saenko, Asli Celikyilmaz, Vikas Chandra
[ArXiv]
Does Progress On Object Recognition Benchmarks Improve Generalization on Crowdsourced, Global Data?
Megan Richards, Polina Kirichenko, Diane Bouchacourt, Mark Ibrahim
ICLR ‘24
[ICLR ‘24]
Exploring Why Object Recognition Performance Degrades Across Income Levels and Geographies
Laura Gustafson, Megan Richards, Melissa Hall, Caner Hazirbas, Diane Bouchacourt, Mark Ibrahim
[(Spotlight) NeurIPS 2023 Datasets and Benchmarks].
Development and Validation of ML-DQA – a Machine Learning Data Quality Assurance Framework for Healthcare
Mark Sendak, Gaurav Sirdeshmukh, Timothy Ochoa, Hayley Premo, Linda Tang, Kira Niederhoffer, Sarah Reed, Kaivalya Deshpande, Emily Sterrett, Melissa Bauer, Laurie Snyder, Afreen Shariff, David Whellan, Jeffrey Riggio, David Gaieski, Kristin Corey, Megan Richards, Michael Gao, Marshall Nichols, Bradley Heintze, William Knechtle, William Ratliff, Suresh Balu
Machine Learning for Healthcare, 2022
[PMLR]
Multicontrast Pocket Colposcopy Cervical Cancer Diagnostic Algorithm for Referral Populations
Erica Skerrett, Zichen Miao, Mercy N Asiedu, Megan Richards, Brian Crouch, Guillermo Sapiro, Qiang Qiu, Nirmala Ramanujam
BME Frontiers, 2022
[BME Frontiers]

Posters

MIST: Mutual Information Estimators via Supervised Training
German Gritsai^*, Megan Richards^*, Maxime Méloux^*,, Kyunghyun Cho, Maxime Peyrard
^*joint first author
Structured Probabilistic Inference & Generative Modeling Workshop, ICML 2026
_SCBench: A Testbed for Causal Inference with Time Series Panel Data
Megan Richards, Saeyoung Rho, Kyunghyun Cho
Foundation Models for Structured Data, ICML 2026
Does Progress On Object Recognition Benchmarks Improve Generalization on Crowdsourced, Global Data?
Megan Richards, Polina Kirichenko, Diane Bouchacourt, Mark Ibrahim
Data Centric Machine Learning (DMLR) Workship, ICML 2023
Towards Deploying Predictive Models for Maternal Health
Kaivalya Deshpande, Willie Boag, Freya Gulamali, Megan Richards, Michael Gao, Namita Kansal, Vaishakhi Mayya, Mark Sendak, Ashraf Habib, Terrence Allen, Sarah McWay Boling, Melissa Bauer, Jennifer Gilner, Brenna Hughes, Courtney Mitchell, Heather Tally, Amanda Craig, Suresh Balu, William Knechtle
Machine Learning for Healthcare, 2023
Phenotype Development and Validation for a Maternal Early Warning System
Megan Richards, MS Michael Gao, William Knechtle, Namita Kansal, Vaishakhi Mayya, MD Sendak, Ashraf Habib, Terrence Allen, Sarah McWay Boling, Melissa RN, DO Bauer, Jennifer Gilner, MD Courtney Mitchell
Machine Learning for Healthcare, 2022
Development of a Speculum-Free Liquid Applicator for At-Home Cervical Cancer Screening
Erica Skerrett, Mercy N Asiedu, Megan Richards, John Wilson Schmitt, Nirmala Ramanujam
Best Poster, NIH IEEE HIPOCT Conference, 2019

Talks

CVPR 2025, DemoDiv Workshop
Geographic Underrepresentation in Computer Vision
Slides

Organizing

I’m really excited by efforts to make machine learning/science more inclusive, and am proud to part of the following efforts:

Organizer, Queer In AI 🌈
I helped organize the QinAI NeurIPS 2023 workshop - see our NeurIPS website here and our org website here.
Discussion Lead, Women-In-Machine-Learning (WiML) at ICML 2023
I helped organize a breakout session on robustness and large-scale vision models at the Women in Machine Learning workshop at ICML 2023 (slides).

Service

Reviewer, ICLR Workshops 2024
I was a reviewer for the Workshops at ICLR 2024. Excited to see so many great new avenues of research!
Reviewer, DMLR Workshop at ICML 2023
I was a reviewer for the DMLR workshop at ICML 2023. See more about the workshop here.