About Me

I’m a Computer Science PhD student at NYU’s Courant Institute for Mathematical Sciences, working with Kyunghyun Cho. My PhD is supported by NYU’s Dean’s Doctoral Fellowship, and the NSF Graduate Research Fellowship (GRFP). Previously, I was an AI Resident at Meta AI (FAIR labs) where I was fortunate to be advised by Mark Ibrahim and Diane Bouchacourt. I graduated with departmental distinction from Duke University, where I studied Electrical and Computer Engineering with a concentration in Machine Learning.

My current work at NYU focuses on building more scalable, general-purpose statistical tools, with the goal of improving the study of modern ML datasets and models.

At FAIR, I studied the reliability of vision-language models (during which I became convinced that we needed better tools to measure datasets). My work at FAIR included included studying the increasing divergence between imagenet-based benchmarks and global, crowdsourced data (ICLR 2024), investigating why vision models failed to generalize to images from non-western countries (Spotlight, NeurIPS 2023), and designing metrics for image generation (Outstanding Paper, TiFA workshop ICML 2024). I recently shared an overview of this line of work at CVPR’s DemoDiv workshop (slides).

I became motivated to work in machine learning research (and model reliability more specifically) after experiences building models in healthcare settings. At Duke, I worked with Mark Sendak as part of the Duke Institute for Healthcare Innovation (DIHI), to build risk prediction models for severe pregnancy complications, which are now integrated and in silent trials. While at Duke, I also worked at the Duke Center for Global Women’s Health Technologies on a self-screening device for cervical cancer designed for low-resource global settings, which earned a Best Research award at NIH’s IEEE HIPOCT Conference in 2019.

Recent Updates

  • Nov ‘25 📝 Check out our most recent preprint, MIST: Mutual Information via Supervised Training! We meta-learn mutual information estimators, building estimators that learn to estimate MI directly from samples (rather than computing density or density ratios through bounds). Our method achieves substantial gains for high-dimensional, low-sample settings, with massive improvements in inference efficiency.

  • June ‘25: Delighted to be speaking at CVPR’s DemoDiv workshop about some of our work studying geographic underrepresentation in computer vision. I made my presentation publicly available here!

  • Nov ‘24: 📝 Check out our new NAACL ‘25 work On the Role of Speech Data in Reducing Toxicity Detection Bias, led by Samuel Bell! We generate and release a new set of multilingual toxicity annotations for MuTox, and find that when models have access to the audio itself, rather than a transcript, they are more accurate and less biased in detecting toxicity (w.r.t group mentions).

  • July ‘24:🎊 We’re honored to receive an outstanding paper award at the ICML TiFA workshop for our work measuring geographic disparities in image generations! It was such a pleasure to help supervise this project, led by Abhishek Sureddy, Dishant Padelia, and Nandhinee Periyakaruppa.

  • May ‘24: 📝 Check out our Introduction to Vision-Language Modeling, created through a broad collaboration of researchers (> 40 people across 10 institutions) to help democratize knowledge about VLMs!

  • April ‘24: 🎓 This fall, I will start a PhD at NYU Computer Science, working with Kyunghyun Cho. My work will be supported by NYU’s GSAS Dean’s Doctoral Fellowship, as well as the NSF Graduate Research Fellowship (GRFP).

Publications

  • MIST: Mutual Information via Supervised Training
    German Gritsai*, Megan Richards*,, Maxime Méloux*,, Kyunghyun Cho, Maxime Peyrard
    *joint first author
    [ArXiv]

  • On the Role of Speech Data in Reducing Toxicity Detection Bias
    Samuel J. Bell*, Mariano Coria Megliol*, Megan Richards*, Eduardo Sánchez*,
    Christophe Ropers, Skyler Wang, Adina Williams, Levent Sagun, Marta R. Costa-jussà*
    *core contributor
    NAACL 2025
    [ArXiv]

  • Decomposed Evaluations of Geographic Disparities in Text-To-Image Models
    Abhishek Sureddy*, Dishant Padalia*, Nandhinee Periyakaruppa*, Oindrila Saha, Adina Williams, Adriana Romero-Soriano,
    Megan Richards**, Polina Kirichenko**, Melissa Hall**
    *joint first author **joint senior author
    Outstanding Paper, Trustworthy Multi-modal Foundation Models and AI Agents (TiFA) Workshop, ICML 2024.
    Next Generation of AI Safety Workshop, ICML 2024.
    [ArXiv]

  • An Introduction to Vision-Language Modeling
    Florian Bordes, Richard Yuanzhe Pang, Anurag Ajay, …, Megan Richards, …, Kate Saenko, Asli Celikyilmaz, Vikas Chandra
    [ArXiv]

  • Does Progress On Object Recognition Benchmarks Improve Generalization on Crowdsourced, Global Data?
    Megan Richards, Polina Kirichenko, Diane Bouchacourt, Mark Ibrahim
    ICLR ‘24
    [ICLR ‘24]

  • Exploring Why Object Recognition Performance Degrades Across Income Levels and Geographies
    Laura Gustafson, Megan Richards, Melissa Hall, Caner Hazirbas, Diane Bouchacourt, Mark Ibrahim
    [(Spotlight) NeurIPS 2023 Datasets and Benchmarks].

  • Development and Validation of ML-DQA – a Machine Learning Data Quality Assurance Framework for Healthcare
    Mark Sendak, Gaurav Sirdeshmukh, Timothy Ochoa, Hayley Premo, Linda Tang, Kira Niederhoffer, Sarah Reed, Kaivalya Deshpande, Emily Sterrett, Melissa Bauer, Laurie Snyder, Afreen Shariff, David Whellan, Jeffrey Riggio, David Gaieski, Kristin Corey, Megan Richards, Michael Gao, Marshall Nichols, Bradley Heintze, William Knechtle, William Ratliff, Suresh Balu
    Machine Learning for Healthcare, 2022
    [PMLR]

  • Multicontrast Pocket Colposcopy Cervical Cancer Diagnostic Algorithm for Referral Populations
    Erica Skerrett, Zichen Miao, Mercy N Asiedu, Megan Richards, Brian Crouch, Guillermo Sapiro, Qiang Qiu, Nirmala Ramanujam
    BME Frontiers, 2022
    [BME Frontiers]

Posters

  • Does Progress On Object Recognition Benchmarks Improve Generalization on Crowdsourced, Global Data?
    Megan Richards, Polina Kirichenko, Diane Bouchacourt, Mark Ibrahim
    Poster, Data Centric Machine Learning (DMLR) Workship, ICML 2023

  • Towards Deploying Predictive Models for Maternal Health
    Kaivalya Deshpande, Willie Boag, Freya Gulamali, Megan Richards, Michael Gao, Namita Kansal, Vaishakhi Mayya, Mark Sendak, Ashraf Habib, Terrence Allen, Sarah McWay Boling, Melissa Bauer, Jennifer Gilner, Brenna Hughes, Courtney Mitchell, Heather Tally, Amanda Craig, Suresh Balu, William Knechtle
    Poster, Machine Learning for Healthcare, 2023

  • Phenotype Development and Validation for a Maternal Early Warning System
    Megan Richards, MS Michael Gao, William Knechtle, Namita Kansal, Vaishakhi Mayya, MD Sendak, Ashraf Habib, Terrence Allen, Sarah McWay Boling, Melissa RN, DO Bauer, Jennifer Gilner, MD Courtney Mitchell
    Poster, Machine Learning for Healthcare, 2022

  • Development of a Speculum-Free Liquid Applicator for At-Home Cervical Cancer Screening
    Erica Skerrett, Mercy N Asiedu, Megan Richards, John Wilson Schmitt, Nirmala Ramanujam
    Best Poster, NIH IEEE HIPOCT Conference, 2019

Talks

  • CVPR 2025, DemoDiv Workshop
    Geographic Underrepresentation in Computer Vision
    Slides

Organizing

I’m really excited by efforts to make machine learning/science more inclusive, and am proud to part of the following efforts:

  • Organizer, Queer In AI 🌈
    I helped organize the QinAI NeurIPS 2023 workshop - see our NeurIPS website here and our org website here.

  • Discussion Lead, Women-In-Machine-Learning (WiML) at ICML 2023
    I helped organize a breakout session on robustness and large-scale vision models at the Women in Machine Learning workshop at ICML 2023 (slides).

Service

  • Reviewer, ICLR Workshops 2024
    I was a reviewer for the Workshops at ICLR 2024. Excited to see so many great new avenues of research!

  • Reviewer, DMLR Workshop at ICML 2023
    I was a reviewer for the DMLR workshop at ICML 2023. See more about the workshop here.