About me
I am a postdoctoral research fellow at the Center for Human-Compatible AI (CHAI) at UC Berkeley, where I study AI safety. I am grateful to be mentored by Stuart Russell. In 2021, I received my PhD from the Computer Science Department at Stanford University, advised by Ashish Goel and supported by an NSF Graduate Research Fellowship. I also spent two years doing science and product work at Lyft.
I want to use my career to do good in the world. In general, I am interested in ensuring that technology is safe and beneficial. During my PhD, I explored this question from an economic perspective: how to design markets and algorithms that achieve good outcomes even when people are selfish. Over time, I became increasingly concerned at the rapid pace of AI development, causing me to pivot to the field of AI safety. I'm concerned about a wide range of risks from AI, including but not limited to LLM hallucination, medical errors, critical infrastructure failures, and societal-scale catastrophe.
Research interests
Within AI safety, I mostly study generalization: how a model handles unfamiliar inputs. I think that many types of safety failures can be framed as misgeneralization. I'm especially interested in training models to recognize when they're in unfamiliar situations and then behave cautiously if so (e.g., ask for help).
I do a mix of theory and empirical work. I aim to design methods that are theoretically grounded and can potentially scale to very advanced AI systems, but which can be tested on (and are useful for) systems we have today.
Some representative projects:
- Mitigating goal misgeneralization in video games by asking for help
- Designing algorithms which provably avoid irreversible costs by asking for help
- Quantifying the uncertainty signals in the output probabilities of LLMs
Publications
Lead author: * | Senior author: †
AI Safety
-
Avoiding Catastrophe in Online Learning by Asking for Help
Benjamin Plaut*, Hanlin Zhu, Stuart Russell†. Under submission. -
Probabilities of Chat LLMs Are Miscalibrated but Still Predict Correctness on Multiple-Choice Q&A
Benjamin Plaut*, Nguyen X. Khanh, Tu Trinh. Under submission. -
Getting By Goal Misgeneralization With a Little Help From a Mentor
Tu Trinh*, Mohamad Danesh, Nguyen X. Khanh, Benjamin Plaut†. NeurIPS 2024 Workshop on Safe and Trustworthy Agents.
Resource Allocation
-
Algorithms for Fair Public and Private Resource Allocation
Benjamin Plaut. PhD dissertation (2021). -
Counteracting Inequality in Markets via Convex Pricing
Ashish Goel†, Benjamin Plaut*. WINE 2020. -
Almost Envy-free Repeated Matching in Two-sided Markets
Sreenivas Gollapudi†, Kostas Kollias, Benjamin Plaut*. WINE 2020. -
Optimal Nash Equilibria for Bandwidth Allocation
Benjamin Plaut*. WINE 2020. -
Equality of Power and Fair Public Decision-making
Nicole Immorlica†, Benjamin Plaut*, E. Glen Weyl†. WINE 2019. -
Markets Beyond Nash Welfare for Leontief Utilities
Ashish Goel†, Reyna Hulett, Benjamin Plaut*. WINE 2019. -
Communication Complexity of Discrete Fair Division
Benjamin Plaut*, Tim Roughgarden†. SODA 2019, SICOMP 2020. -
Markets for Public Decision-making
Nikhil Garg, Ashish Goel†, Benjamin Plaut*. WINE 2018. -
Almost Envy-Freeness with General Valuations
Benjamin Plaut*, Tim Roughgarden†. SODA 2018, SIDMA 2020. -
Algorithms for Social Good: Kidney Exchange
Benjamin Plaut. Undergraduate senior thesis, advised by Tuomas Sandholm. -
Hardness of the Pricing Problem in Barter Exchanges
Benjamin Plaut*, John P. Dickerson, Tuomas Sandholm†. Working paper. -
Position-Indexed Formulations for Kidney Exchange
John P. Dickerson, David Manlove†, Benjamin Plaut, Tuomas Sandholm†, and John Trimble*. EC 2016. -
Fast Optimal Clearing of Capped-Chain Barter Exchanges
Benjamin Plaut*, John P. Dickerson, Tuomas Sandholm†. AAAI 2016.
Misc
-
Direct Observation of Folding Energy Landscape of RNA Hairpin at Mechanical Loading Rates
Huizhong Xu*, Benjamin Plaut, Xiran Zhu, Maverick Chen, Udit Mavinkurve, Anindita Maiti, Guangtao Song, Krishna Murari, and Maumita Mandal†. The Journal of Physical Chemistry, 2017.
Work experience
Other things about me
- I speak English (native) and Spanish (advanced proficiency).
- I wrote an algorithmic art-generator called RAMbrandt as a personal project in college.
- I've composed and produced music. Check out my Spotify!
- I've spent a lot of time dancing, especially West Coast Swing, fusion, and ballroom. I was a co-founder and the primary instructor of the Stanford West Coast Swing Club.
