¡Desconecta con la aplicación Player FM !
38.5 - Adrià Garriga-Alonso on Detecting AI Scheming
Manage episode 462005711 series 2844728
Suppose we're worried about AIs engaging in long-term plans that they don't tell us about. If we were to peek inside their brains, what should we look for to check whether this was happening? In this episode Adrià Garriga-Alonso talks about his work trying to answer this question.
Patreon: https://www.patreon.com/axrpodcast
Ko-fi: https://ko-fi.com/axrpodcast
Transcript: https://axrp.net/episode/2025/01/20/episode-38_5-adria-garriga-alonso-detecting-ai-scheming.html
FAR.AI: https://far.ai/
FAR.AI on X (aka Twitter): https://x.com/farairesearch
FAR.AI on YouTube: https://www.youtube.com/@FARAIResearch
The Alignment Workshop: https://www.alignment-workshop.com/
Topics we discuss, and timestamps:
01:04 - The Alignment Workshop
02:49 - How to detect scheming AIs
05:29 - Sokoban-solving networks taking time to think
12:18 - Model organisms of long-term planning
19:44 - How and why to study planning in networks
Links:
Adrià's website: https://agarri.ga/
An investigation of model-free planning: https://arxiv.org/abs/1901.03559
Model-Free Planning: https://tuphs28.github.io/projects/interpplanning/
Planning in a recurrent neural network that plays Sokoban: https://arxiv.org/abs/2407.15421
Episode art by Hamish Doodles: hamishdoodles.com
51 episodios
Manage episode 462005711 series 2844728
Suppose we're worried about AIs engaging in long-term plans that they don't tell us about. If we were to peek inside their brains, what should we look for to check whether this was happening? In this episode Adrià Garriga-Alonso talks about his work trying to answer this question.
Patreon: https://www.patreon.com/axrpodcast
Ko-fi: https://ko-fi.com/axrpodcast
Transcript: https://axrp.net/episode/2025/01/20/episode-38_5-adria-garriga-alonso-detecting-ai-scheming.html
FAR.AI: https://far.ai/
FAR.AI on X (aka Twitter): https://x.com/farairesearch
FAR.AI on YouTube: https://www.youtube.com/@FARAIResearch
The Alignment Workshop: https://www.alignment-workshop.com/
Topics we discuss, and timestamps:
01:04 - The Alignment Workshop
02:49 - How to detect scheming AIs
05:29 - Sokoban-solving networks taking time to think
12:18 - Model organisms of long-term planning
19:44 - How and why to study planning in networks
Links:
Adrià's website: https://agarri.ga/
An investigation of model-free planning: https://arxiv.org/abs/1901.03559
Model-Free Planning: https://tuphs28.github.io/projects/interpplanning/
Planning in a recurrent neural network that plays Sokoban: https://arxiv.org/abs/2407.15421
Episode art by Hamish Doodles: hamishdoodles.com
51 episodios
Все серии
×Bienvenido a Player FM!
Player FM está escaneando la web en busca de podcasts de alta calidad para que los disfrutes en este momento. Es la mejor aplicación de podcast y funciona en Android, iPhone y la web. Regístrate para sincronizar suscripciones a través de dispositivos.