Benjamin Plaut

About me

I am a postdoctoral research fellow at the Center for Human-Compatible AI (CHAI) at UC Berkeley, where I study AI safety. I am grateful to be mentored by Stuart Russell. In 2021, I received my PhD from the Computer Science Department at Stanford University, advised by Ashish Goel and supported by an NSF Graduate Research Fellowship. I also spent two years doing science and product work at Lyft.

I want to use my career to do good in the world. In general, I am interested in ensuring that technology is safe and beneficial. During my PhD, I explored this question from an economic perspective: how to design markets and algorithms that achieve good outcomes even when people are selfish. Over time, I became increasingly concerned at the rapid pace of AI development, causing me to pivot to the field of AI safety. I'm concerned about a wide range of risks from AI, including but not limited to serious LLM errors, critical infrastructure failures, societal-scale catastrophe, exacerbation of societal inequalities, and economic disruption. (Note that my research doesn't cover all of these.)

Research interests

Within AI safety, I mostly study generalization: how a model handles unfamiliar inputs. I think that many types of safety failures can be framed as misgeneralization. I'm especially interested in training models to recognize when they're in unfamiliar situations and then behave cautiously if so (e.g., ask for help).

I do a mix of theory and empirical work. I aim to design methods that are theoretically grounded and can potentially scale to very advanced AI systems, but which can be tested on (and are useful for) systems we have today.

Some representative topics:

Mitigating goal misgeneralization in video games by acting asking for help
Algorithms which provably avoid irreversible costs by acting cautiously
Using uncertainty quantification in LLMs to guide cautious behavior

Publications

Lead author: * | Senior author: †

AI Safety

Asking for Help Enables Safety Guarantees Without Sacrificing Effectiveness
Benjamin Plaut*, Juan Liévano-Karim, Stuart Russell†. Under submission.
Probabilities of Chat LLMs Are Miscalibrated but Still Predict Correctness on Multiple-Choice Q&A
Benjamin Plaut*, Nguyen X. Khanh, Tu Trinh. Under submission.
Learning to Coordinate with Experts
Mohamad H. Danesh*, Tu Trinh, Benjamin Plaut, Nguyen X. Khanh†. Under submission.
Avoiding Catastrophe in Online Learning by Asking for Help
Benjamin Plaut*, Hanlin Zhu, Stuart Russell†. ICML 2025.
Getting By Goal Misgeneralization With a Little Help From a Mentor
Tu Trinh*, Mohamad Danesh, Nguyen X. Khanh, Benjamin Plaut†. NeurIPS 2024 Workshop on Safe and Trustworthy Agents.

Resource Allocation

Algorithms for Fair Public and Private Resource Allocation
Benjamin Plaut. PhD dissertation (2021).
Counteracting Inequality in Markets via Convex Pricing
Ashish Goel†, Benjamin Plaut*. WINE 2020.
Almost Envy-free Repeated Matching in Two-sided Markets
Sreenivas Gollapudi†, Kostas Kollias, Benjamin Plaut*. WINE 2020.
Optimal Nash Equilibria for Bandwidth Allocation
Benjamin Plaut*. WINE 2020.
Equality of Power and Fair Public Decision-making
Nicole Immorlica†, Benjamin Plaut*, E. Glen Weyl†. WINE 2019.
Markets Beyond Nash Welfare for Leontief Utilities
Ashish Goel†, Reyna Hulett, Benjamin Plaut*. WINE 2019.
Communication Complexity of Discrete Fair Division
Benjamin Plaut*, Tim Roughgarden†. SODA 2019, SICOMP 2020.
Markets for Public Decision-making
Nikhil Garg, Ashish Goel†, Benjamin Plaut*. WINE 2018.
Almost Envy-Freeness with General Valuations
Benjamin Plaut*, Tim Roughgarden†. SODA 2018, SIDMA 2020.
Algorithms for Social Good: Kidney Exchange
Benjamin Plaut. Undergraduate senior thesis, advised by Tuomas Sandholm.
Hardness of the Pricing Problem in Barter Exchanges
Benjamin Plaut*, John P. Dickerson, Tuomas Sandholm†. Preprint.
Position-Indexed Formulations for Kidney Exchange
John P. Dickerson, David Manlove†, Benjamin Plaut, Tuomas Sandholm†, and John Trimble*. EC 2016.
Fast Optimal Clearing of Capped-Chain Barter Exchanges
Benjamin Plaut*, John P. Dickerson, Tuomas Sandholm†. AAAI 2016.

Misc

Direct Observation of Folding Energy Landscape of RNA Hairpin at Mechanical Loading Rates
Huizhong Xu*, Benjamin Plaut, Xiran Zhu, Maverick Chen, Udit Mavinkurve, Anindita Maiti, Guangtao Song, Krishna Murari, and Maumita Mandal†. The Journal of Physical Chemistry, 2017.

Work experience

Postdoctoral Researcher UC Berkeley September 2023 – present

Data Scientist Lyft June 2021 – May 2023

Research Intern Google June 2020 – November 2020

Research Intern Google June 2019 – September 2019

Teaching Assistant Stanford University April 2018 – June 2018

Course Assistant Carnegie Mellon University January 2014 – December 2015

Other things about me

I speak English (native) and Spanish (advanced proficiency).
I wrote an algorithmic art-generator called RAMbrandt as a personal project in college.
I've composed and produced music. Check out my Spotify!
I've spent a lot of time dancing, especially West Coast Swing, fusion, and ballroom. I was a co-founder and the primary instructor of the Stanford West Coast Swing Club.