Artwork

Contenido proporcionado por Michaël Trazzi. Todo el contenido del podcast, incluidos episodios, gráficos y descripciones de podcast, lo carga y proporciona directamente Michaël Trazzi o su socio de plataforma de podcast. Si cree que alguien está utilizando su trabajo protegido por derechos de autor sin su permiso, puede seguir el proceso descrito aquí https://es.player.fm/legal.
Player FM : aplicación de podcast
¡Desconecta con la aplicación Player FM !

Owain Evans - AI Situational Awareness, Out-of-Context Reasoning

2:15:46
 
Compartir
 

Manage episode 435779364 series 2966339
Contenido proporcionado por Michaël Trazzi. Todo el contenido del podcast, incluidos episodios, gráficos y descripciones de podcast, lo carga y proporciona directamente Michaël Trazzi o su socio de plataforma de podcast. Si cree que alguien está utilizando su trabajo protegido por derechos de autor sin su permiso, puede seguir el proceso descrito aquí https://es.player.fm/legal.

Owain Evans is an AI Alignment researcher, research associate at the Center of Human Compatible AI at UC Berkeley, and now leading a new AI safety research group.

In this episode we discuss two of his recent papers, “Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs” and “Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data”, alongside some Twitter questions.

LINKS

Patreon: https://www.patreon.com/theinsideview Manifund: https://manifund.org/projects/making-52-ai-alignment-video-explainers-and-podcasts Ask questions: https://twitter.com/MichaelTrazzi Owain Evans: https://twitter.com/owainevans_uk

OUTLINE

(00:00:00) Intro

(00:01:12) Owain's Agenda

(00:02:25) Defining Situational Awareness

(00:03:30) Safety Motivation

(00:04:58) Why Release A Dataset

(00:06:17) Risks From Releasing It

(00:10:03) Claude 3 on the Longform Task

(00:14:57) Needle in a Haystack

(00:19:23) Situating Prompt

(00:23:08) Deceptive Alignment Precursor

(00:30:12) Distribution Over Two Random Words

(00:34:36) Discontinuing a 01 sequence

(00:40:20) GPT-4 Base On the Longform Task

(00:46:44) Human-AI Data in GPT-4's Pretraining

(00:49:25) Are Longform Task Questions Unusual

(00:51:48) When Will Situational Awareness Saturate

(00:53:36) Safety And Governance Implications Of Saturation

(00:56:17) Evaluation Implications Of Saturation

(00:57:40) Follow-up Work On The Situational Awarenss Dataset

(01:00:04) Would Removing Chain-Of-Thought Work?

(01:02:18) Out-of-Context Reasoning: the "Connecting the Dots" paper

(01:05:15) Experimental Setup

(01:07:46) Concrete Function Example: 3x + 1

(01:11:23) Isn't It Just A Simple Mapping?

(01:17:20) Safety Motivation

(01:22:40) Out-Of-Context Reasoning Results Were Surprising

(01:24:51) The Biased Coin Task

(01:27:00) Will Out-Of-Context Resaoning Scale

(01:32:50) Checking If In-Context Learning Work

(01:34:33) Mixture-Of-Functions

(01:38:24) Infering New Architectures From ArXiv

(01:43:52) Twitter Questions

(01:44:27) How Does Owain Come Up With Ideas?

(01:49:44) How Did Owain's Background Influence His Research Style And Taste?

(01:52:06) Should AI Alignment Researchers Aim For Publication?

(01:57:01) How Can We Apply LLM Understanding To Mitigate Deceptive Alignment?

(01:58:52) Could Owain's Research Accelerate Capabilities?

(02:08:44) How Was Owain's Work Received?

(02:13:23) Last Message

  continue reading

55 episodios

Artwork
iconCompartir
 
Manage episode 435779364 series 2966339
Contenido proporcionado por Michaël Trazzi. Todo el contenido del podcast, incluidos episodios, gráficos y descripciones de podcast, lo carga y proporciona directamente Michaël Trazzi o su socio de plataforma de podcast. Si cree que alguien está utilizando su trabajo protegido por derechos de autor sin su permiso, puede seguir el proceso descrito aquí https://es.player.fm/legal.

Owain Evans is an AI Alignment researcher, research associate at the Center of Human Compatible AI at UC Berkeley, and now leading a new AI safety research group.

In this episode we discuss two of his recent papers, “Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs” and “Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data”, alongside some Twitter questions.

LINKS

Patreon: https://www.patreon.com/theinsideview Manifund: https://manifund.org/projects/making-52-ai-alignment-video-explainers-and-podcasts Ask questions: https://twitter.com/MichaelTrazzi Owain Evans: https://twitter.com/owainevans_uk

OUTLINE

(00:00:00) Intro

(00:01:12) Owain's Agenda

(00:02:25) Defining Situational Awareness

(00:03:30) Safety Motivation

(00:04:58) Why Release A Dataset

(00:06:17) Risks From Releasing It

(00:10:03) Claude 3 on the Longform Task

(00:14:57) Needle in a Haystack

(00:19:23) Situating Prompt

(00:23:08) Deceptive Alignment Precursor

(00:30:12) Distribution Over Two Random Words

(00:34:36) Discontinuing a 01 sequence

(00:40:20) GPT-4 Base On the Longform Task

(00:46:44) Human-AI Data in GPT-4's Pretraining

(00:49:25) Are Longform Task Questions Unusual

(00:51:48) When Will Situational Awareness Saturate

(00:53:36) Safety And Governance Implications Of Saturation

(00:56:17) Evaluation Implications Of Saturation

(00:57:40) Follow-up Work On The Situational Awarenss Dataset

(01:00:04) Would Removing Chain-Of-Thought Work?

(01:02:18) Out-of-Context Reasoning: the "Connecting the Dots" paper

(01:05:15) Experimental Setup

(01:07:46) Concrete Function Example: 3x + 1

(01:11:23) Isn't It Just A Simple Mapping?

(01:17:20) Safety Motivation

(01:22:40) Out-Of-Context Reasoning Results Were Surprising

(01:24:51) The Biased Coin Task

(01:27:00) Will Out-Of-Context Resaoning Scale

(01:32:50) Checking If In-Context Learning Work

(01:34:33) Mixture-Of-Functions

(01:38:24) Infering New Architectures From ArXiv

(01:43:52) Twitter Questions

(01:44:27) How Does Owain Come Up With Ideas?

(01:49:44) How Did Owain's Background Influence His Research Style And Taste?

(01:52:06) Should AI Alignment Researchers Aim For Publication?

(01:57:01) How Can We Apply LLM Understanding To Mitigate Deceptive Alignment?

(01:58:52) Could Owain's Research Accelerate Capabilities?

(02:08:44) How Was Owain's Work Received?

(02:13:23) Last Message

  continue reading

55 episodios

Todos los episodios

×
 
Loading …

Bienvenido a Player FM!

Player FM está escaneando la web en busca de podcasts de alta calidad para que los disfrutes en este momento. Es la mejor aplicación de podcast y funciona en Android, iPhone y la web. Regístrate para sincronizar suscripciones a través de dispositivos.

 

Guia de referencia rapida