About Me
I’m a Computer Science PhD student at NYU’s Courant Institute for Mathematical Sciences, working with Kyunghyun Cho. My PhD is supported by NYU’s Dean’s Doctoral Fellowship, and the NSF Graduate Research Fellowship (GRFP). Previously, I was an AI Resident at Meta AI (FAIR labs) where I was fortunate to be advised by Mark Ibrahim and Diane Bouchacourt. I graduated with departmental distinction from Duke University, where I studied Electrical and Computer Engineering with a concentration in Machine Learning.
My current work at NYU focuses on building more scalable, general-purpose statistical tools, with the goal of improving the study of modern ML datasets and models.
At FAIR, I studied the reliability of vision-language models (during which I became convinced that we needed better tools to measure datasets). My work at FAIR included included studying the increasing divergence between imagenet-based benchmarks and global, crowdsourced data (ICLR 2024), investigating why vision models failed to generalize to images from non-western countries (Spotlight, NeurIPS 2023), and designing metrics for image generation (Outstanding Paper, TiFA workshop ICML 2024). I recently shared an overview of this line of work at CVPR’s DemoDiv workshop (slides).
I became motivated to work in machine learning research (and model reliability more specifically) after experiences building models in healthcare settings. At Duke, I worked with Mark Sendak as part of the Duke Institute for Healthcare Innovation (DIHI), to build risk prediction models for severe pregnancy complications, which are now integrated and in silent trials. While at Duke, I also worked at the Duke Center for Global Women’s Health Technologies on a self-screening device for cervical cancer designed for low-resource global settings, which earned a Best Research award at NIH’s IEEE HIPOCT Conference in 2019.
Recent Updates
Nov ‘25 📝 Check out our most recent preprint, MIST: Mutual Information via Supervised Training! We meta-learn mutual information estimators, building estimators that learn to estimate MI directly from samples (rather than computing density or density ratios through bounds). Our method achieves substantial gains for high-dimensional, low-sample settings, with massive improvements in inference efficiency.
June ‘25: Delighted to be speaking at CVPR’s DemoDiv workshop about some of our work studying geographic underrepresentation in computer vision. I made my presentation publicly available here!
Nov ‘24: 📝 Check out our new NAACL ‘25 work On the Role of Speech Data in Reducing Toxicity Detection Bias, led by Samuel Bell! We generate and release a new set of multilingual toxicity annotations for MuTox, and find that when models have access to the audio itself, rather than a transcript, they are more accurate and less biased in detecting toxicity (w.r.t group mentions).
July ‘24:🎊 We’re honored to receive an outstanding paper award at the ICML TiFA workshop for our work measuring geographic disparities in image generations! It was such a pleasure to help supervise this project, led by Abhishek Sureddy, Dishant Padelia, and Nandhinee Periyakaruppa.
May ‘24: 📝 Check out our Introduction to Vision-Language Modeling, created through a broad collaboration of researchers (> 40 people across 10 institutions) to help democratize knowledge about VLMs!
April ‘24: 🎓 This fall, I will start a PhD at NYU Computer Science, working with Kyunghyun Cho. My work will be supported by NYU’s GSAS Dean’s Doctoral Fellowship, as well as the NSF Graduate Research Fellowship (GRFP).
Publications
MIST: Mutual Information via Supervised Training
German Gritsai*, Megan Richards*,, Maxime Méloux*,, Kyunghyun Cho, Maxime Peyrard
*joint first author
[ArXiv]On the Role of Speech Data in Reducing Toxicity Detection Bias
Samuel J. Bell*, Mariano Coria Megliol*, Megan Richards*, Eduardo Sánchez*,
Christophe Ropers, Skyler Wang, Adina Williams, Levent Sagun, Marta R. Costa-jussà*
*core contributor
NAACL 2025
[ArXiv]Decomposed Evaluations of Geographic Disparities in Text-To-Image Models
Abhishek Sureddy*, Dishant Padalia*, Nandhinee Periyakaruppa*, Oindrila Saha, Adina Williams, Adriana Romero-Soriano,
Megan Richards**, Polina Kirichenko**, Melissa Hall**
*joint first author **joint senior author
Outstanding Paper, Trustworthy Multi-modal Foundation Models and AI Agents (TiFA) Workshop, ICML 2024.
Next Generation of AI Safety Workshop, ICML 2024.
[ArXiv]An Introduction to Vision-Language Modeling
Florian Bordes, Richard Yuanzhe Pang, Anurag Ajay, …, Megan Richards, …, Kate Saenko, Asli Celikyilmaz, Vikas Chandra
[ArXiv]Does Progress On Object Recognition Benchmarks Improve Generalization on Crowdsourced, Global Data?
Megan Richards, Polina Kirichenko, Diane Bouchacourt, Mark Ibrahim
ICLR ‘24
[ICLR ‘24]Exploring Why Object Recognition Performance Degrades Across Income Levels and Geographies
Laura Gustafson, Megan Richards, Melissa Hall, Caner Hazirbas, Diane Bouchacourt, Mark Ibrahim
[(Spotlight) NeurIPS 2023 Datasets and Benchmarks].Development and Validation of ML-DQA – a Machine Learning Data Quality Assurance Framework for Healthcare
Mark Sendak, Gaurav Sirdeshmukh, Timothy Ochoa, Hayley Premo, Linda Tang, Kira Niederhoffer, Sarah Reed, Kaivalya Deshpande, Emily Sterrett, Melissa Bauer, Laurie Snyder, Afreen Shariff, David Whellan, Jeffrey Riggio, David Gaieski, Kristin Corey, Megan Richards, Michael Gao, Marshall Nichols, Bradley Heintze, William Knechtle, William Ratliff, Suresh Balu
Machine Learning for Healthcare, 2022
[PMLR]Multicontrast Pocket Colposcopy Cervical Cancer Diagnostic Algorithm for Referral Populations
Erica Skerrett, Zichen Miao, Mercy N Asiedu, Megan Richards, Brian Crouch, Guillermo Sapiro, Qiang Qiu, Nirmala Ramanujam
BME Frontiers, 2022
[BME Frontiers]
Posters
Does Progress On Object Recognition Benchmarks Improve Generalization on Crowdsourced, Global Data?
Megan Richards, Polina Kirichenko, Diane Bouchacourt, Mark Ibrahim
Poster, Data Centric Machine Learning (DMLR) Workship, ICML 2023Towards Deploying Predictive Models for Maternal Health
Kaivalya Deshpande, Willie Boag, Freya Gulamali, Megan Richards, Michael Gao, Namita Kansal, Vaishakhi Mayya, Mark Sendak, Ashraf Habib, Terrence Allen, Sarah McWay Boling, Melissa Bauer, Jennifer Gilner, Brenna Hughes, Courtney Mitchell, Heather Tally, Amanda Craig, Suresh Balu, William Knechtle
Poster, Machine Learning for Healthcare, 2023Phenotype Development and Validation for a Maternal Early Warning System
Megan Richards, MS Michael Gao, William Knechtle, Namita Kansal, Vaishakhi Mayya, MD Sendak, Ashraf Habib, Terrence Allen, Sarah McWay Boling, Melissa RN, DO Bauer, Jennifer Gilner, MD Courtney Mitchell
Poster, Machine Learning for Healthcare, 2022Development of a Speculum-Free Liquid Applicator for At-Home Cervical Cancer Screening
Erica Skerrett, Mercy N Asiedu, Megan Richards, John Wilson Schmitt, Nirmala Ramanujam
Best Poster, NIH IEEE HIPOCT Conference, 2019
Talks
- CVPR 2025, DemoDiv Workshop
Geographic Underrepresentation in Computer Vision
Slides
Organizing
I’m really excited by efforts to make machine learning/science more inclusive, and am proud to part of the following efforts:
Organizer, Queer In AI 🌈
I helped organize the QinAI NeurIPS 2023 workshop - see our NeurIPS website here and our org website here.Discussion Lead, Women-In-Machine-Learning (WiML) at ICML 2023
I helped organize a breakout session on robustness and large-scale vision models at the Women in Machine Learning workshop at ICML 2023 (slides).
Service
Reviewer, ICLR Workshops 2024
I was a reviewer for the Workshops at ICLR 2024. Excited to see so many great new avenues of research!Reviewer, DMLR Workshop at ICML 2023
I was a reviewer for the DMLR workshop at ICML 2023. See more about the workshop here.
