¡Desconecta con la aplicación Player FM !
SRE at Google: Planet-scale observability - OpenObservability Talks S2E05
Manage episode 314577129 series 3252969
Have you ever wondered how services are operated at Google’s scale? Here’s your opportunity to find out. Ramón will share how his SRE team runs Google’s identity services, and the elaborate end-to-end observability they use to achieve it with strict SLA. We’ll also get a glimpse at the birthplace of Kubernetes, OpenCensus, Dapper, Monarch and other cornerstones of today’s cloud-native DevOps and observability.
Ramón Medrano Llamas (@rmedranollamas) is a staff site reliability engineer at Google, focused on user identity and authentication. He concentrates on the reliability aspects of new Google products and new features of existing products, ensuring that they meet the same high bar as every other Google service. Before joining Google in 2013, he worked at CERN developing and designing distributed systems for physics. He holds a master’s degree in computer science and is pursuing a PhD on distributed systems.
The episode was live-streamed on 26 October 2021 and the video is available at https://youtube.com/live/jVTZf1SXZrg
Show Notes:
- scale and size of Google Identity services operation
- evolution from monitoring to observability
- telemetry collection
- SRE job description is changing
- Google Dapper
- Google Census
- operating end-to-end observability at scale
- flexibility vs. runbook in SRE
- how SRE at google different
- transition from monolith to MSA
- Linux Foundation launching a DevOps bootcamp
- Parca OSS launched
- how to intro SRE culture
Resources:
52 episodios
Manage episode 314577129 series 3252969
Have you ever wondered how services are operated at Google’s scale? Here’s your opportunity to find out. Ramón will share how his SRE team runs Google’s identity services, and the elaborate end-to-end observability they use to achieve it with strict SLA. We’ll also get a glimpse at the birthplace of Kubernetes, OpenCensus, Dapper, Monarch and other cornerstones of today’s cloud-native DevOps and observability.
Ramón Medrano Llamas (@rmedranollamas) is a staff site reliability engineer at Google, focused on user identity and authentication. He concentrates on the reliability aspects of new Google products and new features of existing products, ensuring that they meet the same high bar as every other Google service. Before joining Google in 2013, he worked at CERN developing and designing distributed systems for physics. He holds a master’s degree in computer science and is pursuing a PhD on distributed systems.
The episode was live-streamed on 26 October 2021 and the video is available at https://youtube.com/live/jVTZf1SXZrg
Show Notes:
- scale and size of Google Identity services operation
- evolution from monitoring to observability
- telemetry collection
- SRE job description is changing
- Google Dapper
- Google Census
- operating end-to-end observability at scale
- flexibility vs. runbook in SRE
- how SRE at google different
- transition from monolith to MSA
- Linux Foundation launching a DevOps bootcamp
- Parca OSS launched
- how to intro SRE culture
Resources:
52 episodios
Semua episod
×Bienvenido a Player FM!
Player FM está escaneando la web en busca de podcasts de alta calidad para que los disfrutes en este momento. Es la mejor aplicación de podcast y funciona en Android, iPhone y la web. Regístrate para sincronizar suscripciones a través de dispositivos.