About me

I am a postdoctoral research fellow at the Center for Human-Compatible AI (CHAI) at UC Berkeley, where I study AI safety. I am grateful to be mentored by Stuart Russell. In 2021, I received my PhD from the Computer Science Department at Stanford University, advised by Ashish Goel and supported by an NSF Graduate Research Fellowship. I also spent two years doing science and product work at Lyft.

I want to use my career to do good in the world. In general, I am interested in ensuring that technology is safe and beneficial. During my PhD, I explored this question from an economic perspective: how to design markets and algorithms that achieve good outcomes even when people are selfish. Over time, I became increasingly concerned at the rapid pace of AI development, causing me to pivot to the field of AI safety. I'm concerned about a wide range of risks from AI, including but not limited to serious LLM errors, critical infrastructure failures, societal-scale catastrophe, exacerbation of societal inequalities, and economic disruption. (Note that my research doesn't cover all of these.)

Research interests

Within AI safety, I mostly study generalization: how a model handles unfamiliar inputs. I think that many types of safety failures can be framed as misgeneralization. I'm especially interested in training models to recognize when they're in unfamiliar situations and then behave cautiously if so (e.g., ask for help).

I do a mix of theory and empirical work. I aim to design methods that are theoretically grounded and can potentially scale to very advanced AI systems, but which can be tested on (and are useful for) systems we have today.

Some representative topics:

  • Mitigating goal misgeneralization in video games by acting asking for help
  • Algorithms which provably avoid irreversible costs by acting cautiously
  • Using uncertainty quantification in LLMs to guide cautious behavior

Publications

Lead author: * | Senior author: †

AI Safety

  1. Safe Learning Under Irreversible Dynamics via Asking for Help
    Benjamin Plaut*, Juan Liévano-Karim, Hanlin Zhu, Stuart Russell†.
    Working paper.
  2. Learning to Coordinate with Experts
    Mohamad H. Danesh*, Nguyen X. Khanh, Tu Trinh, Benjamin Plaut.
    Working paper.
  3. Probabilities of Chat LLMs Are Miscalibrated but Still Predict Correctness on Multiple-Choice Q&A
    Benjamin Plaut*, Nguyen X. Khanh, Tu Trinh.
    Transactions on Machine Learning Research (TMLR), 2025.
  4. Avoiding Catastrophe in Online Learning by Asking for Help
    Benjamin Plaut*, Hanlin Zhu, Stuart Russell†.
    International Conference on Machine Learning (ICML), 2025.
  5. Getting By Goal Misgeneralization With a Little Help From a Mentor
    Tu Trinh*, Mohamad Danesh, Nguyen X. Khanh, Benjamin Plaut†.
    Neural Information Processing Systems (NeurIPS) 2024 Workshop on Safe and Trustworthy Agents.

Resource Allocation Algorithms

  1. Algorithms for Fair Public and Private Resource Allocation
    Benjamin Plaut. PhD dissertation, 2021.
  2. Counteracting Inequality in Markets via Convex Pricing
    Ashish Goel†, Benjamin Plaut*. Conference on Web and Internet Economics (WINE) 2020.
  3. Almost Envy-free Repeated Matching in Two-sided Markets
    Sreenivas Gollapudi†, Kostas Kollias, Benjamin Plaut*. Conference on Web and Internet Economics (WINE) 2020.
  4. Optimal Nash Equilibria for Bandwidth Allocation
    Benjamin Plaut*. Conference on Web and Internet Economics (WINE) 2020.
  5. Equality of Power and Fair Public Decision-making
    Nicole Immorlica†, Benjamin Plaut*, E. Glen Weyl†. Conference on Web and Internet Economics (WINE) 2019.
  6. Markets Beyond Nash Welfare for Leontief Utilities
    Ashish Goel†, Reyna Hulett, Benjamin Plaut*. Conference on Web and Internet Economics (WINE) 2019.
  7. Communication Complexity of Discrete Fair Division
    Benjamin Plaut*, Tim Roughgarden†. Symposium on Discrete Algorithms (SODA), 2019; SIAM Journal on Computing (SICOMP), 2020.
  8. Markets for Public Decision-making
    Nikhil Garg, Ashish Goel†, Benjamin Plaut*. Conference on Web and Internet Economics (WINE) 2018.
  9. Almost Envy-Freeness with General Valuations
    Benjamin Plaut*, Tim Roughgarden†. Symposium on Discrete Algorithms (SODA), 2018; SIAM Journal on Discrete Mathematics (SIDMA), 2020.
  10. Algorithms for Social Good: Kidney Exchange
    Benjamin Plaut. Undergraduate honors thesis, 2016. Won the Allen Newell Award for Excellence in Undergraduate Research (best thesis in Computer Science).
  11. Hardness of the Pricing Problem in Barter Exchanges
    Benjamin Plaut*, John P. Dickerson, Tuomas Sandholm†. Preprint.
  12. Position-Indexed Formulations for Kidney Exchange
    John P. Dickerson, David Manlove†, Benjamin Plaut, Tuomas Sandholm†, and John Trimble*. Economics and Computation (EC), 2016.
  13. Fast Optimal Clearing of Capped-Chain Barter Exchanges
    Benjamin Plaut*, John P. Dickerson, Tuomas Sandholm†. AAAI Conference on Artificial Intelligence, 2016.

Physical Chemistry

  1. Direct Observation of Folding Energy Landscape of RNA Hairpin at Mechanical Loading Rates
    Huizhong Xu*, Benjamin Plaut, Xiran Zhu, Maverick Chen, Udit Mavinkurve, Anindita Maiti, Guangtao Song, Krishna Murari, and Maumita Mandal†. The Journal of Physical Chemistry, 2017.

Work experience

Postdoctoral Researcher UC Berkeley September 2023 – present
Data Scientist Lyft June 2021 – May 2023
Research Intern Google Summer 2019, Summer 2020

Other things about me

  • I speak English (native) and Spanish (advanced proficiency).
  • I wrote an algorithmic art-generator called RAMbrandt as a personal project in college.
  • I've composed and produced music. Check out my Spotify!