-Ensure service reliability, availability, and performance (SLA/SLO) -Monitor systems, troubleshoot issues, and drive incident resolution & RCA -Build and maintain CI/CD pipelines and release automation [simplilearn.com] -Automate infrastructure using Infrastructure-as-Code (ARM/Terraform/Bicep) -Implement monitoring, alerting, and observability solutions -Partner with engineering teams to enable safe and efficient deployments -Improve system scalability, resilience, and operational efficiency -Bachelor's degree in Computer Science or equivalent -2-4+ years in SRE / DevOps / Software Engineering -Strong coding/scripting skills (Python, C#, Java, or similar) -CI/CD: Azure DevOps, GitHub Actions, Jenkins -Monitoring: Azure Monitor, Prometheus, Grafana -OS: Linux / Windows -Experience with large-scale distributed systems or data platforms -Familiarity with: -Containers and orchestration (Docker, Kubernetes) -Observability stacks and telemetry systems