<p><strong>[About the role]</strong></p>
<p>We are looking for a Senior DevOps / Platform Engineer to own the end-to-end cloud infrastructure, reliability, security, and CI/CD platforms for our platform products. This is a high-ownership, hands-on role responsible for building and operating production-grade infrastructure that supports high availability, regulatory compliance, scalability, and cost efficiency. You will act as the platform backbone for fast-moving product and engineering teams.</p>
<p><strong>[Key Responsibilities]</strong></p>
<p><strong> Infrastructure & Platform</strong></p>
<ul>
<li>Design, deploy, and operate cloud-native infrastructure on AWS or Azure</li>
<li>Own Kubernetes-based microservices infrastructure across environments (prod, staging, dev)</li>
<li>Define infra standards, environments, networking, and release guardrails</li>
<li>Manage databases, caches, queues, and supporting infra from a platform perspective</li>
</ul>
<p><strong> Reliability & Availability</strong></p>
<ul>
<li>Define and enforce SLOs / SLAs for critical user and payment flows</li>
<li>Build for high availability, auto-scaling, and fault tolerance</li>
<li>Implement zero-downtime deployments and safe rollout strategies</li>
<li>Own incident response, postmortems (RCA), and preventive action plans</li>
<li>Plan capacity and scaling for traffic spikes and seasonal peaks</li>
</ul>
<p><strong> CI/CD & Release Engineering</strong></p>
<ul>
<li>Build and maintain CI/CD pipelines for backend, frontend, and mobile applications</li>
<li>Standardize deployment pipelines with rollback, approvals, and auditability</li>
<li>Enable faster, safer releases without compromising reliability</li>
</ul>
<p><strong> Observability & Monitoring</strong></p>
<ul>
<li>Own the observability stack across services and infrastructure</li>
<li>Implement monitoring for:</li>
<li>Latency, throughput, error rates</li>
<li>Infrastructure health</li>
<li>Payment and checkout flows</li>
<li>Define alerting strategies aligned with business impact, not noise</li>
</ul>
<p><strong> Security & Compliance</strong></p>
<ul>
<li>Own platform-level security posture and compliance readiness</li>
<li>Implement and manage:</li>
<li>Secrets management and key rotation</li>
<li>IAM, RBAC, and least-privilege access</li>
<li>Network security, TLS, WAFs, and audit logging</li>
<li>Partner with security/compliance stakeholders during audits and reviews</li>
<li>Ensure infrastructure is audit-ready (logs, access trails, change history)</li>
</ul>
<p><strong> Cost & Efficiency</strong></p>
<ul>
<li>Monitor and optimize cloud costs across environments</li>
<li>Design infra with cost-efficiency in mind without sacrificing reliability</li>
<li>Provide visibility into infra usage and cost drivers</li>
</ul>
<p><strong>[Required Experience]</strong></p>
<ul>
<li>4+ years of experience in DevOps / SRE / Platform Engineering</li>
<li>Hands-on experience running production-grade Kubernetes workloads</li>
<li>Experience supporting high-availability, high-throughput systems</li>
<li>Strong understanding of microservices infrastructure</li>
<li>Experience working with fintech, payments, or transaction-heavy systems</li>
<li>Proven ownership of uptime, reliability, and incident management</li>
</ul>
<p><strong> Expected Tech Exposure</strong></p>
<ul>
<li>Cloud: AWS or Azure</li>
<li>Containers: Docker, Kubernetes</li>
<li>CI/CD: GitHub Actions, Jenkins, Fastlane</li>
<li>Infrastructure as Code: Terraform / ARM / Pulumi</li>
<li>Databases & Caches: PostgreSQL, Redis</li>
<li>Messaging: Kafka / Event Hubs</li>
<li>Security: Vaults, SSL/TLS, key rotation, network policies</li>
<li>Observability: Prometheus, Grafana, Azure Monitor, Sentry</li>
</ul>
<p><strong>Good-to-Have</strong></p>
<ul>
<li>Experience with UPI, wallets, payment gateways, or banking systems</li>
<li>Exposure to multi-cloud or hybrid environments</li>
<li>Familiarity with compliance standards (PCI-DSS, SOC-style controls, audit workflows)</li>
<li>Experience defining SLOs, error budgets, and release safety mechanisms</li>
</ul>
<p><strong>You’ll Be Successful If</strong></p>
<ul>
<li>You think in systems, reliability, and failure scenarios</li>
<li>You proactively prevent outages, not just react to them</li>
<li>You enjoy owning infrastructure end-to-end, not just tooling</li>
<li>You build platforms that enable product teams to move fast, safely</li>
<li>You’re comfortable being accountable for uptime, security, and scale</li>
</ul>