An abstract illustration of swirling shapes, meant to denote a futuristic feeling.

Research - Papers

Explore a selection of our published work on a variety of key research challenges in AI.

Diverging Preferences: When do Annotators Disagree and do Models Know?

Michael J.Q. ZhangZhilin WangJena D. HwangValentina Pyatkin

2025

ICML

We examine diverging preferences in human-labeled preference datasets. We develop a taxonomy of disagreement sources spanning 10 categories across four high-level classes -- task underspecification,…

SafetyAnalyst: Interpretable, transparent, and steerable safety moderation for AI behavior

Jing-Jing LiValentina PyatkinMax Kleiman-WeinerSydney Levine

2025

ICML

The ideal AI safety moderation system would be both structurally interpretable (so its decisions can be reliably explained) and steerable (to align to safety standards and reflect a community's…

Holodeck: Language Guided Generation of 3D Embodied AI Environments

Yue YangFan-Yun SunLuca WeihsChristopher Clark

2025

Computer Vision and Pattern Recognition

3D simulated environments play a critical role in Embodied AI, but their creation requires expertise and extensive manual effort, restricting their diversity and scope. To miti-gate this limitation,…

ACE2: accurately learning subseasonal to decadal atmospheric variability and forced responses

Oliver Watt‐MeyerBrian HennJeremy McGibbonChristopher S. Bretherton

2025

NPG Climate and Atmospheric Science

Existing machine learning models of weather variability are not formulated to enable assessment of their response to varying external boundary conditions such as sea surface temperature and…

Applying the ACE2 Emulator to SST Green's Functions for the E3SMv3 Global Atmosphere Model

Elynn WuF. RebassooPappu PaulChristopher S. Bretherton

2025

arXiv

Green's functions are a useful technique for interpreting atmospheric state responses to changes in the spatial pattern of sea surface temperature (SST). Here we train version 2 of the Ai2 Climate…

RewardBench: Evaluating Reward Models for Language Modeling

Nathan LambertValentina PyatkinJacob Daniel MorrisonHanna Hajishirzi

2025

NAACL Findings

Reward models (RMs) are at the crux of successfully using RLHF to align pretrained models to human preferences, yet there has been relatively little study that focuses on evaluation of those models.…

Superlatives in Context: Modeling the Implicit Semantics of Superlatives

Valentina PyatkinBonnie WebberIdo DaganReut Tsarfaty

2025

NAACL

Superlatives are used to single out elements with a maximal/minimal property. Semantically, superlatives perform a set comparison: something (or some things) has the min/max property out of a set.…

Re-evaluating Automatic LLM System Ranking for Alignment with Human Preference

Mingqi GaoYixin LiuXinyu HuArman Cohan

2025

NAACL

Evaluating and ranking the capabilities of different LLMs is crucial for understanding their performance and alignment with human preferences. Due to the high cost and time-consuming nature of human…

Social-RAG: Retrieving from Group Interactions to Socially Ground Proactive AI Generation to Group Preferences

Ruotong WangXinyi ZhouLin QiuAmy X. Zhang

2025

CHI

AI agents are increasingly tasked with making proactive suggestions in online spaces where groups collaborate, but can be unhelpful or even annoying, due to not fitting the group's preferences or…

Answer, Assemble, Ace: Understanding How LMs Answer Multiple Choice Questions

Sarah WiegreffeOyvind TafjordYonatan BelinkovAshish Sabharwal

2025

ICLR

Multiple-choice question answering (MCQA) is a key competence of performant transformer language models that is tested by mainstream benchmarks. However, recent evidence shows that models can have…

1-10Next