Artwork

Contenido proporcionado por Adam Hawkins. Todo el contenido del podcast, incluidos episodios, gráficos y descripciones de podcast, lo carga y proporciona directamente Adam Hawkins o su socio de plataforma de podcast. Si cree que alguien está utilizando su trabajo protegido por derechos de autor sin su permiso, puede seguir el proceso descrito aquí https://es.player.fm/legal.
Player FM : aplicación de podcast
¡Desconecta con la aplicación Player FM !

Incidents & Operations with Dan Slimmon

1:01:48
 
Compartir
 

Manage episode 433608422 series 2814917
Contenido proporcionado por Adam Hawkins. Todo el contenido del podcast, incluidos episodios, gráficos y descripciones de podcast, lo carga y proporciona directamente Adam Hawkins o su socio de plataforma de podcast. Si cree que alguien está utilizando su trabajo protegido por derechos de autor sin su permiso, puede seguir el proceso descrito aquí https://es.player.fm/legal.

In this episode, Adam welcomes Dan Slimmon, an experienced Site Reliability Engineer (SRE) to discuss aspects of incident response and troubleshooting in software engineering. Dan explains his methodology for clinical troubleshooting, the importance of maintaining a common mental model, and techniques for leading effective incident response efforts. They also delve into the value of continuous ops reviews and ongoing mental model updates to prevent issues, emphasizing the need for structured processes and effective communication.

Want more?

Chapters

  • (00:00) - Incidents & Operations
  • (01:14) - Guest Welcome
  • (01:40) - Dan's Career Journey
  • (02:33) - Evolution of Tech Stacks
  • (04:59) - Clinical Troubleshooting Explained
  • (11:53) - Incident Response Fundamentals
  • (17:41) - Effective Communication in Incidents
  • (26:09) - Training for Incident Response
  • (33:22) - The Essence of Incident Response
  • (33:53) - Balancing Short-Term and Long-Term Fixes
  • (35:01) - The Firefighting Analogy in Software Incidents
  • (37:11) - Postmortems: Learning from Incidents
  • (42:14) - Building a Shared Mental Model
  • (42:41) - Looking for Trouble: Proactive System Monitoring
  • (47:59) - Ops Reviews: Continuous Improvement
  • (54:37) - The Importance of Closing the Feedback Loop
  • (59:40) - Final Thoughts and Resources
★ Support this podcast on Patreon ★
  continue reading

117 episodios

Artwork
iconCompartir
 
Manage episode 433608422 series 2814917
Contenido proporcionado por Adam Hawkins. Todo el contenido del podcast, incluidos episodios, gráficos y descripciones de podcast, lo carga y proporciona directamente Adam Hawkins o su socio de plataforma de podcast. Si cree que alguien está utilizando su trabajo protegido por derechos de autor sin su permiso, puede seguir el proceso descrito aquí https://es.player.fm/legal.

In this episode, Adam welcomes Dan Slimmon, an experienced Site Reliability Engineer (SRE) to discuss aspects of incident response and troubleshooting in software engineering. Dan explains his methodology for clinical troubleshooting, the importance of maintaining a common mental model, and techniques for leading effective incident response efforts. They also delve into the value of continuous ops reviews and ongoing mental model updates to prevent issues, emphasizing the need for structured processes and effective communication.

Want more?

Chapters

  • (00:00) - Incidents & Operations
  • (01:14) - Guest Welcome
  • (01:40) - Dan's Career Journey
  • (02:33) - Evolution of Tech Stacks
  • (04:59) - Clinical Troubleshooting Explained
  • (11:53) - Incident Response Fundamentals
  • (17:41) - Effective Communication in Incidents
  • (26:09) - Training for Incident Response
  • (33:22) - The Essence of Incident Response
  • (33:53) - Balancing Short-Term and Long-Term Fixes
  • (35:01) - The Firefighting Analogy in Software Incidents
  • (37:11) - Postmortems: Learning from Incidents
  • (42:14) - Building a Shared Mental Model
  • (42:41) - Looking for Trouble: Proactive System Monitoring
  • (47:59) - Ops Reviews: Continuous Improvement
  • (54:37) - The Importance of Closing the Feedback Loop
  • (59:40) - Final Thoughts and Resources
★ Support this podcast on Patreon ★
  continue reading

117 episodios

Todos los episodios

×
 
Loading …

Bienvenido a Player FM!

Player FM está escaneando la web en busca de podcasts de alta calidad para que los disfrutes en este momento. Es la mejor aplicación de podcast y funciona en Android, iPhone y la web. Regístrate para sincronizar suscripciones a través de dispositivos.

 

Guia de referencia rapida