About Me

I’m a Computer Science PhD student at NYU’s Courant Institute for Mathematical Sciences, working with Kyunghyun Cho. My PhD is supported by NYU’s Dean’s Doctoral Fellowship, and the NSF Graduate Research Fellowship (GRFP). Previously, I was an AI Resident at Meta AI (FAIR labs) where I was fortunate to be advised by Mark Ibrahim and Diane Bouchacourt. I graduated with departmental distinction from Duke University, where I studied Electrical and Computer Engineering with a concentration in Machine Learning.

My work at NYU focuses on building more scalable, general-purpose statistical tools for problems in causal inference and information theory.

At FAIR, I studied the reliability of vision-language models (during which I became convinced that we needed better tools to measure datasets). My work at FAIR included included studying the increasing divergence between imagenet-based benchmarks and global, crowdsourced data (ICLR 2024), investigating why vision models failed to generalize to images from non-western countries (Spotlight, NeurIPS 2023), and designing metrics for image generation (Outstanding Paper, TiFA workshop ICML 2024). I recently shared an overview of this line of work at CVPR’s DemoDiv workshop (slides).

I became motivated to work in machine learning research after experiences building models in healthcare settings. At Duke, I worked with Mark Sendak as part of the Duke Institute for Healthcare Innovation (DIHI), to build risk prediction models for severe pregnancy complications. While at Duke, I also worked at the Duke Center for Global Women’s Health Technologies on a self-screening device for cervical cancer designed for low-resource global settings, which earned a Best Research award at NIH’s IEEE HIPOCT Conference in 2019.

Recent Updates

  • June ‘25 📝 I’ll be at ICML in Seoul presenting two workshop papers! I’ll be at SPIGM presenting work meta-learning mutual information estimators (full version published at TMLR in May), and then at FMSD, I’ll be presenting work benchmarking foundation models for synthetic control (a causal inference problem based on observational time-series data).

  • May ‘25 📝 Our most recent work, MIST: Mutual Information via Supervised Traning, is now published in TMLR! We meta-learn mutual information estimators, building estimators that learn to estimate MI directly from samples (rather than computing density or density ratios through bounds). Our method achieves substantial gains for high-dimensional, low-sample settings, with orders-of-magnitude improvements in inference efficiency.

  • June ‘25: Delighted to be speaking at CVPR’s DemoDiv workshop about some of our work studying geographic underrepresentation in computer vision. I made my presentation publicly available here!

  • Nov ‘24: 📝 Check out our new NAACL ‘25 work On the Role of Speech Data in Reducing Toxicity Detection Bias, led by Samuel Bell! We generate and release a new set of multilingual toxicity annotations for MuTox, and find that when models have access to the audio itself, rather than a transcript, they are more accurate and less biased in detecting toxicity (w.r.t group mentions).

Publications

  • MIST: Mutual Information via Supervised Training
    German Gritsai*, Megan Richards*,, Maxime Méloux*,, Kyunghyun Cho, Maxime Peyrard
    *joint first author
    [Transactions on Machine Learning Research (TMLR)]

  • On the Role of Speech Data in Reducing Toxicity Detection Bias
    Samuel J. Bell*, Mariano Coria Megliol*, Megan Richards*, Eduardo Sánchez*,
    Christophe Ropers, Skyler Wang, Adina Williams, Levent Sagun, Marta R. Costa-jussà*
    *core contributor
    NAACL 2025
    [ArXiv]

  • Decomposed Evaluations of Geographic Disparities in Text-To-Image Models
    Abhishek Sureddy*, Dishant Padalia*, Nandhinee Periyakaruppa*, Oindrila Saha, Adina Williams, Adriana Romero-Soriano,
    Megan Richards**, Polina Kirichenko**, Melissa Hall**
    *joint first author **joint senior author
    Outstanding Paper, Trustworthy Multi-modal Foundation Models and AI Agents (TiFA) Workshop, ICML 2024.
    Next Generation of AI Safety Workshop, ICML 2024.
    [ArXiv]

  • An Introduction to Vision-Language Modeling
    Florian Bordes, Richard Yuanzhe Pang, Anurag Ajay, …, Megan Richards, …, Kate Saenko, Asli Celikyilmaz, Vikas Chandra
    [ArXiv]

  • Does Progress On Object Recognition Benchmarks Improve Generalization on Crowdsourced, Global Data?
    Megan Richards, Polina Kirichenko, Diane Bouchacourt, Mark Ibrahim
    ICLR ‘24
    [ICLR ‘24]

  • Exploring Why Object Recognition Performance Degrades Across Income Levels and Geographies
    Laura Gustafson, Megan Richards, Melissa Hall, Caner Hazirbas, Diane Bouchacourt, Mark Ibrahim
    [(Spotlight) NeurIPS 2023 Datasets and Benchmarks].

  • Development and Validation of ML-DQA – a Machine Learning Data Quality Assurance Framework for Healthcare
    Mark Sendak, Gaurav Sirdeshmukh, Timothy Ochoa, Hayley Premo, Linda Tang, Kira Niederhoffer, Sarah Reed, Kaivalya Deshpande, Emily Sterrett, Melissa Bauer, Laurie Snyder, Afreen Shariff, David Whellan, Jeffrey Riggio, David Gaieski, Kristin Corey, Megan Richards, Michael Gao, Marshall Nichols, Bradley Heintze, William Knechtle, William Ratliff, Suresh Balu
    Machine Learning for Healthcare, 2022
    [PMLR]

  • Multicontrast Pocket Colposcopy Cervical Cancer Diagnostic Algorithm for Referral Populations
    Erica Skerrett, Zichen Miao, Mercy N Asiedu, Megan Richards, Brian Crouch, Guillermo Sapiro, Qiang Qiu, Nirmala Ramanujam
    BME Frontiers, 2022
    [BME Frontiers]

Posters

  • MIST: Mutual Information Estimators via Supervised Training
    German Gritsai*, Megan Richards*, Maxime Méloux*,, Kyunghyun Cho, Maxime Peyrard
    *joint first author
    Structured Probabilistic Inference & Generative Modeling Workshop, ICML 2026

  • _SCBench: A Testbed for Causal Inference with Time Series Panel Data
    Megan Richards, Saeyoung Rho, Kyunghyun Cho
    Foundation Models for Structured Data, ICML 2026

  • Does Progress On Object Recognition Benchmarks Improve Generalization on Crowdsourced, Global Data?
    Megan Richards, Polina Kirichenko, Diane Bouchacourt, Mark Ibrahim
    Data Centric Machine Learning (DMLR) Workship, ICML 2023

  • Towards Deploying Predictive Models for Maternal Health
    Kaivalya Deshpande, Willie Boag, Freya Gulamali, Megan Richards, Michael Gao, Namita Kansal, Vaishakhi Mayya, Mark Sendak, Ashraf Habib, Terrence Allen, Sarah McWay Boling, Melissa Bauer, Jennifer Gilner, Brenna Hughes, Courtney Mitchell, Heather Tally, Amanda Craig, Suresh Balu, William Knechtle
    Machine Learning for Healthcare, 2023

  • Phenotype Development and Validation for a Maternal Early Warning System
    Megan Richards, MS Michael Gao, William Knechtle, Namita Kansal, Vaishakhi Mayya, MD Sendak, Ashraf Habib, Terrence Allen, Sarah McWay Boling, Melissa RN, DO Bauer, Jennifer Gilner, MD Courtney Mitchell
    Machine Learning for Healthcare, 2022

  • Development of a Speculum-Free Liquid Applicator for At-Home Cervical Cancer Screening
    Erica Skerrett, Mercy N Asiedu, Megan Richards, John Wilson Schmitt, Nirmala Ramanujam
    Best Poster, NIH IEEE HIPOCT Conference, 2019

Talks

  • CVPR 2025, DemoDiv Workshop
    Geographic Underrepresentation in Computer Vision
    Slides

Organizing

I’m really excited by efforts to make machine learning/science more inclusive, and am proud to part of the following efforts:

  • Organizer, Queer In AI 🌈
    I helped organize the QinAI NeurIPS 2023 workshop - see our NeurIPS website here and our org website here.

  • Discussion Lead, Women-In-Machine-Learning (WiML) at ICML 2023
    I helped organize a breakout session on robustness and large-scale vision models at the Women in Machine Learning workshop at ICML 2023 (slides).

Service

  • Reviewer, ICLR Workshops 2024
    I was a reviewer for the Workshops at ICLR 2024. Excited to see so many great new avenues of research!

  • Reviewer, DMLR Workshop at ICML 2023
    I was a reviewer for the DMLR workshop at ICML 2023. See more about the workshop here.