Artwork

Contenido proporcionado por HackerNoon. Todo el contenido del podcast, incluidos episodios, gráficos y descripciones de podcast, lo carga y proporciona directamente HackerNoon o su socio de plataforma de podcast. Si cree que alguien está utilizando su trabajo protegido por derechos de autor sin su permiso, puede seguir el proceso descrito aquí https://es.player.fm/legal.
Player FM : aplicación de podcast
¡Desconecta con la aplicación Player FM !

AI Safety and Alignment: Could LLMs Be Penalized for Deepfakes and Misinformation?

8:10
 
Compartir
 

Manage episode 430727965 series 3474148
Contenido proporcionado por HackerNoon. Todo el contenido del podcast, incluidos episodios, gráficos y descripciones de podcast, lo carga y proporciona directamente HackerNoon o su socio de plataforma de podcast. Si cree que alguien está utilizando su trabajo protegido por derechos de autor sin su permiso, puede seguir el proceso descrito aquí https://es.player.fm/legal.

This story was originally published on HackerNoon at: https://hackernoon.com/ai-safety-and-alignment-could-llms-be-penalized-for-deepfakes-and-misinformation-ecabdwv.
Penalty-tuning for LLMs: Where they can be penalized for misuses or negative outputs, within their awareness, as another channel for AI safety and alignment.
Check more stories related to machine-learning at: https://hackernoon.com/c/machine-learning. You can also check exclusive content about #ai-safety, #ai-alignment, #agi, #superintelligence, #llms, #deepfakes, #misinformation, #hackernoon-top-story, and more.
This story was written by: @davidstephen. Learn more about this writer by checking @davidstephen's about page, and for more stories, please visit hackernoon.com.
A research area for AI safety and alignment could be to seek out how some memory or compute access of large language models [LLMs] might be briefly truncated, as a form of penalty for certain outputs or misuses, including biological threats. AI should not just be able to refuse an output, acting within guardrail, but slow the next response or shut down for that user, so that it is not penalized itself. LLMs have—large—language awareness and usage awareness, these could be channels to make it know, after pre-training that it could lose something, if it outputs deepfakes, misinformation, biological threats, or if it continues to allow a misuser try different prompts without shutting down or slowing against openness to a malicious intent. This could make it safer, since it would lose something and will know it has.

  continue reading

316 episodios

Artwork
iconCompartir
 
Manage episode 430727965 series 3474148
Contenido proporcionado por HackerNoon. Todo el contenido del podcast, incluidos episodios, gráficos y descripciones de podcast, lo carga y proporciona directamente HackerNoon o su socio de plataforma de podcast. Si cree que alguien está utilizando su trabajo protegido por derechos de autor sin su permiso, puede seguir el proceso descrito aquí https://es.player.fm/legal.

This story was originally published on HackerNoon at: https://hackernoon.com/ai-safety-and-alignment-could-llms-be-penalized-for-deepfakes-and-misinformation-ecabdwv.
Penalty-tuning for LLMs: Where they can be penalized for misuses or negative outputs, within their awareness, as another channel for AI safety and alignment.
Check more stories related to machine-learning at: https://hackernoon.com/c/machine-learning. You can also check exclusive content about #ai-safety, #ai-alignment, #agi, #superintelligence, #llms, #deepfakes, #misinformation, #hackernoon-top-story, and more.
This story was written by: @davidstephen. Learn more about this writer by checking @davidstephen's about page, and for more stories, please visit hackernoon.com.
A research area for AI safety and alignment could be to seek out how some memory or compute access of large language models [LLMs] might be briefly truncated, as a form of penalty for certain outputs or misuses, including biological threats. AI should not just be able to refuse an output, acting within guardrail, but slow the next response or shut down for that user, so that it is not penalized itself. LLMs have—large—language awareness and usage awareness, these could be channels to make it know, after pre-training that it could lose something, if it outputs deepfakes, misinformation, biological threats, or if it continues to allow a misuser try different prompts without shutting down or slowing against openness to a malicious intent. This could make it safer, since it would lose something and will know it has.

  continue reading

316 episodios

Todos los episodios

×
 
Loading …

Bienvenido a Player FM!

Player FM está escaneando la web en busca de podcasts de alta calidad para que los disfrutes en este momento. Es la mejor aplicación de podcast y funciona en Android, iPhone y la web. Regístrate para sincronizar suscripciones a través de dispositivos.

 

Guia de referencia rapida

Escucha este programa mientras exploras
Reproducir