Site Reliability Engineering (SRE)

Bengaluru, India Mid Posted 2026-01-09

Don't apply into the void — reach the hiring manager

ResuMail finds the recruiters and hiring managers behind this Site Reliability Engineering (SRE) role at Lyzr, drafts a personalised outreach email, and schedules the send — so your application actually gets seen.

Reach the hiring manager ›

About this role

Job Title: Site Reliability Engineering (SRE) Location : Preferred Onsite ,Remote is OK Experience:–2-5 Years Technical Qualifications Must-Have Skills Experience: 2-5 years in SRE, DevOps, or Systems Engineering roles with a strong focus on AWS . Cloud Proficiency: Expert-level knowledge of AWS core services and architecture standards. Scripting: Strong proficiency in Python or Shell/Bash for automation. Cost Tools: Experience with AWS Cost Explorer, Trusted Advisor, or 3rd party tools (e.g., Cloud Health) to drive financial efficiency. Monitoring: Hands-on experience with tools like Grafana, Prometheus, ELK Stack, or Splunk. Preferred Qualifications Experience in Hybrid Cloud environments (AWS + On-Prem/Data Center). Knowledge of container orchestration (Kubernetes/EKS). Understanding of database administration and replication (PostgreSQL, MySQL, or DynamoDB). System Ownership & Reliability End-to-End Ownership: Own the health and lifecycle of production systems, ensuring high availability (HA) and meeting strict Service Level Objectives (SLOs). Deep-Dive Debugging: Troubleshoot and resolve complex issues across infrastructure, application code, and networking layers. You will be the escalation point for hard-to-solve production incidents. Incident Management: Lead Root Cause Analysis (RCA) processes for outages, driving permanent fixes and architectural changes to prevent recurrence. Operational Excellence & Security Disaster Recovery (DR): Design and manage DR strategies; conduct periodic failover drills to ensure business continuity. Security & Compliance: Oversee OS patching, vulnerability scanning, and adherence to industry compliance standards (SOC2/HIPAA/ISO). Maintain strict IAM policies and security groups. Observability: Build and maintain comprehensive monitoring, logging, and alerting frameworks (CloudWatch, Prometheus, Datadog) to ensure early detection of anomalies. Maintenance: Define and maintain backup/restore processes and routine maintenance windows with minimal downtime. SRE & Automation Eliminate Toil: Apply SRE principles to automate repetitive operational tasks, reducing manual intervention. IaC & Tooling: Develop automation tools and manage infrastructure using Terraform or CloudFormation , along with scripting in Python , Go , or Bash . Self-Healing Systems: Implement auto-remediation workflows where systems can detect and resolve common issues (e.g., restarting failed services, rotating bad nodes) without human intervention. Performance Tuning: optimize application runtime parameters, database queries, and system kernel settings for maximum throughput. Cloud & Cost Optimization (FinOps) AWS Management: Architect and manage extensive AWS services—EC2, EKS/ECS, RDS, S3, Lambda, VPC, and Route53. Cost Efficiency: Actively monitor cloud spend and drive Cost Optimization initiatives. This includes rightsizing instances, managing Reserved/Spot instances, and identifying idle resources to reduce waste. Capacity Planning: Collaborate with engineering teams to forecast infrastructure needs, ensuring we scale to meet demand without over-provisioning. Work Environment & Soft Skills Global Flexibility: We work with clients across IST, GMT, and EST time zones. You must be flexible with your working hours to accommodate project-specific deployments, overlapping meetings, or on-call rotations. Team Player: Willingness to help out with other cloud-related workloads (even outside your primary AWS focus) when the team is under pressure. Detective Mindset: You are relentless when debugging and won't stop until you find the root cause. Financial Awareness: You treat cloud resources as real money and take pride in running a lean, efficient infrastructure.

How to get this job at Lyzr

Don't rely on the portal. Cold applications for a role like Site Reliability Engineering (SRE) land in a pile of hundreds. A direct, personalised message to the hiring manager or a referrer is the fastest way in.
Find the right person. ResuMail surfaces the actual recruiters and hiring managers at Lyzr — not a generic careers inbox.
Send tailored outreach. ResuMail drafts an email personalised to your resume and this role, then paces and schedules sends so you stay out of spam.
Follow up. One polite nudge after 5–7 days roughly doubles reply rates — scheduled for you.

Reach Lyzr's hiring managers today.

Free to start. No credit card. Built for Indian job seekers.

Start free with ResuMail ›