Site Reliability Engineer - 2

Bengaluru, India Mid Posted 2025-11-05

Don't apply into the void — reach the hiring manager

ResuMail finds the recruiters and hiring managers behind this Site Reliability Engineer - 2 role at Signzy, drafts a personalised outreach email, and schedules the send — so your application actually gets seen.

Reach the hiring manager ›

About this role

About Signzy Signzy is an AI-powered RPA platform for financial services. No matter how complex your workflow or operational complexity, Signzy can completely automate your back-operations decision-making process into a real-time API. This is possible due to a combination of Nebula - Our no-code AI model builder and our Fintech API Marketplace of over 200+ APIs. Today we work with over 90+ FIs globally including the 4 largest banks in India and a Top 3 acquiring Bank in the US. Globally we have a strong partnership with MasterCard and offices in New York and Dubai to serve our customers in the 2 geographies. Our Product team of 120+ people is building a global AI product out of Bangalore. Working at Signzy At Signzy we breathe software and exploit the latest technologies to create the most amazing products. We comprise a tech-savvy team and are backed by investors who are enthusiastic about creating solutions using technology. This is an invitation to be a part of the future! Role Overview We are looking for a Site Reliability Engineer (SRE-2) to help design, operate, and improve reliable, scalable systems in cloud and Kubernetes environments. This role involves close collaboration with engineering and platform teams to automate operations, improve observability, and ensure production systems remain stable and performant as they scale. You will work on infrastructure, deployment pipelines, and operational tooling while actively participating in incident response and long-term reliability improvements. Responsibilities Design, deploy, and operate reliable and scalable systems across cloud and Kubernetes environments. Automate infrastructure provisioning, deployments, and operational workflows. Build and maintain tools for deployment, monitoring, and system operations. Monitor system health and performance, and proactively identify areas for improvement. Troubleshoot and resolve issues across development, test, and production environments. Participate in incident response, root cause analysis, and reliability improvements. Collaborate with engineering teams to improve system operability and deployment safety. Support and operate large-scale systems, including data-intensive or AI-driven workloads. Requirements 3–5+ years of experience managing and operating production infrastructure and services in cloud environments such as AWS, Azure, or GCP. Strong hands-on experience with Linux systems in production environments. Experience working with containerized workloads and Kubernetes in real-world scenarios. Working knowledge of Infrastructure as Code tools such as Terraform, Terragrunt, or Crossplane . Experience designing and maintaining CI/CD pipelines using tools such as GitHub Actions, GitLab CI, Jenkins, Azure DevOps, or similar . Familiarity with GitOps principles and tools such as Argo CD or Flux . Solid understanding of cloud networking concepts , load balancing, and service connectivity. Experience with monitoring, logging, and alerting systems such as Prometheus, Grafana, ELK/EFK, Datadog, or equivalent . Proficiency in at least one scripting or programming language (e.g., Bash, Python). Experience working with relational databases ; exposure to NoSQL or data platforms is a plus. Experience participating in on-call rotations , responding to production incidents, and performing root cause analysis. Understanding of SLIs, SLOs, and error budgets , and how they are used to guide reliability and operational decisions. Strong problem-solving skills and the ability to debug complex production issues. Good verbal and written communication skills, especially during incidents and technical discussions. Nice to Have Experience operating systems at scale or in high-availability environments. Exposure to on-prem or hybrid infrastructure. Experience supporting data platforms, analytics, or AI/ML workloads. What We Value A strong sense of ownership and responsibility for production systems. A focus on automation, reliability, and operational simplicity . The ability to balance speed, stability, and long-term maintainability. Curiosity and willingness to continuously improve systems and processes.

How to get this job at Signzy

Don't rely on the portal. Cold applications for a role like Site Reliability Engineer - 2 land in a pile of hundreds. A direct, personalised message to the hiring manager or a referrer is the fastest way in.
Find the right person. ResuMail surfaces the actual recruiters and hiring managers at Signzy — not a generic careers inbox.
Send tailored outreach. ResuMail drafts an email personalised to your resume and this role, then paces and schedules sends so you stay out of spam.
Follow up. One polite nudge after 5–7 days roughly doubles reply rates — scheduled for you.

Reach Signzy's hiring managers today.

Free to start. No credit card. Built for Indian job seekers.

Start free with ResuMail ›