Currently, we primarily oversee AI with human supervision and human-run experiments, possibly augmented by off-the-shelf AI assistants like ChatGPT or Claude. At training time, we run RLHF, where humans (and/or chat assistants)
This is a brief overview of a recent release by Transluce. You can see the full write-up on the Transluce website.
AI systems are increasingly being used as agents: scaffolded systems in which
We are launching an independent research lab that builds open, scalable technology for understanding AI systems and steering them in the public interest.
Transluce means to shine light through something to reveal its
To communicate risks, we often turn to stories. Nuclear weapons conjure stories of mutually assured destruction, briefcases with red buttons, and nuclear winter. Climate change conjures stories of extreme weather, cities overtaken by
This is a landing page for various posts I’ve written, and plan to write, about forecasting future developments in AI. I draw on the field of human judgmental forecasting, sometimes colloquially referred