Gaurav Yadav
Production Support | Site Reliability Engineer
Reliability-driven engineer supporting distributed systems, triaging production incidents, and improving service performance. Focused on automation, observability, and reducing operational toil.
About Me
Reliability-driven engineer with 3+ years of experience supporting distributed systems, triaging production incidents, and improving service performance. Skilled in AWS, Kubernetes, Terraform, CI/CD pipelines, and Datadog, with a strong focus on automation, observability, and reducing operational toil.
Proven ability to enhance system reliability, optimize monitoring, drive MTTR reduction, and collaborate with engineering teams to build resilient, scalable platforms.
Technical Skills
Work Experience
Technical Support Analyst III
Cars24 Services Private Limited
- Managed production reliability for microservices using Datadog metrics, logs, traces, and dashboards, improving anomaly detection and significantly reducing MTTR.
- Led incident response end-to-end: triage, mitigation, coordination, and deep RCA using SQL and log analysis, driving long-term fixes aligned with SLOs.
- Identified performance bottlenecks and reliability gaps across distributed systems; collaborated with Engineering and DevOps to deploy preventive fixes.
- Developed automation scripts and diagnostic tools in Python/Bash to eliminate repetitive operational tasks and accelerate root-cause identification.
- Improved alerting and escalation workflows, reducing noise and enhancing on-call efficiency.
- Authored runbooks, incident documentation, and reliability best practices in Confluence.
Education
BRCM College of Engineering and Technology
2012 - 2016
Bachelor of Technology
Featured Projects
Kubernetes Deployment & Observability
Containerised and deployed a scalable microservice app on Kubernetes. Implemented HPA-based auto-scaling, Helm-based deployments, and integrated Datadog for logs, metrics, and APM to achieve end-to-end observability.
AWS Infrastructure Automation
Automated provisioning of VPC, EC2, RDS, security groups, and load balancers using Terraform modules. Implemented state locking and improves consistency reducing manual effort.
CI/CD Pipeline Engineering
Built a full Jenkins-based CI/CD pipeline using Git, Docker, and AWS. Automated build, test, and deploy steps; added notification, rollback logic, and validation hooks to improve deployment reliability.
Get In Touch
Always open to discussing new opportunities, reliability engineering, or just having a chat about cloud tech.
+91-98967 44504
gyadav456@gmail.com
Gurugram, Haryana