resu·mail

Senior Cloud/DevOps Engineer (SDE - 3)

at SatSure

Bengaluru, India Senior Posted 2026-04-01

Don't apply into the void — reach the hiring manager

ResuMail finds the recruiters and hiring managers behind this Senior Cloud/DevOps Engineer (SDE - 3) role at SatSure, drafts a personalised outreach email, and schedules the send — so your application actually gets seen.

Reach the hiring manager ›

About this role

About SatSure SatSure is a deep-tech decision intelligence company operating at the nexus of agriculture, infrastructure, and climate action. We turn earth observation data into actionable insights for governments, financial institutions, and enterprises across the developing world — at scale, with reliability. Our platform team owns the infrastructure backbone that powers SatSure's AI/ML products: multi-cloud Kubernetes clusters, LLM inference pipelines, geospatial data platforms, and the internal developer tooling used by every engineering team. If you care about infrastructure quality and want your work to have real-world impact, this is the role. About the Role We are looking for a Senior DevOps & MLOps Engineer to join our Platform & DevOps team. You will design, build, and operate cloud-native infrastructure that supports ML model serving, data pipelines, and developer platforms across AWS, GCP, and Azure. You will work closely with data science, product engineering, and security teams — and be expected to own large surface areas end-to-end. This is a hands-on senior IC role. You will architect systems, write Terraform and Helm, debug production incidents, define SLOs, and contribute to platform standards adopted org-wide. Roles & Responsibilities ML Platform & LLM Infrastructure Own and operate Kubernetes-based ML platform on EKS — supporting LLM inference (KServe), distributed compute (Dask/Ray), and workflow orchestration (Apache Airflow). Partner with data science and ML teams to design, deploy, and scale ML workloads — including GPU scheduling, autoscaling, resource isolation, and SLO-driven reliability. Architect, deploy, and optimize Ray clusters on Kubernetes for distributed ML workloads — enabling scalable training, batch inference, and low-latency serving with efficient CPU/GPU utilization. Multi-Cloud Platform & Infrastructure Design, build, and maintain cloud-native infrastructure across AWS (primary), GCP, and Azure — using Kubernetes (EKS / GKE / AKS), Terraform, Helm, and ArgoCD. Drive GitOps adoption and platform standardization — define reusable infrastructure patterns, Helm charts, and deployment workflows used across all product teams. Manage Kubernetes platform operations — cluster lifecycle, Karpenter-based autoscaling, multi-tenancy, and workload isolation for data science and engineering teams. Implement and maintain service mesh (Istio) — mTLS enforcement, traffic policies, and observability for inter-service communication. Maintain and improve the internal developer platform (Backstage IDP) — enabling self-service environments, service catalog, and onboarding workflows for engineering teams. Observability & Reliability Engineering Build and maintain full-stack observability infrastructure — metrics (Prometheus / Mimir), logs (Loki), traces (Tempo), and dashboards (Grafana) integrated with OpenTelemetry instrumentation. Define SLIs, SLOs, and error budget policies for production ML and platform services; lead incident response and post-mortem reviews. Proactively identify reliability risks and drive engineering improvements to maintain 99.9%+ uptime targets. FinOps & Cost Engineering Implement Kubernetes cost attribution and chargeback using Kubecost / OpenCost — driving per-team visibility and FinOps decision-making for AI infrastructure. Continuously optimize cloud spend through workload right-sizing, spot/preemptible usage, and resource scheduling strategies. Platform Security & Governance Manage AWS multi-account governance using Control Tower, SCPs, GuardDuty, and IAM Identity Center — ensuring security posture across all environments. Own OIDC identity and SSO infrastructure integrated across internal tooling — Backstage, Airflow, and platform services. Support compliance and audit processes — ISO 27001, CIS Benchmarks, Well-Architected Reviews, and VAPT assessments. Requirements Must Have 5+ years of hands-on platform, DevOps, or SRE experience in production environments. Strong Kubernetes expertise — cluster operations, Helm, RBAC, autoscaling (Karpenter / Cluster Autoscaler), multi-tenancy; EKS experience preferred. Infrastructure as Code — Terraform (advanced), Ansible; experience managing large, multi-environment IaC codebases. AWS expertise — EC2, EKS, S3, RDS, IAM, VPC, CloudWatch, Control Tower, GuardDuty; GCP or Azure exposure is a plus. GitOps & CI/CD — ArgoCD, Bitbucket Pipelines / Jenkins, GitOps workflows at team scale. Observability — hands-on with Prometheus, Grafana, and at least one of: Loki, Tempo, OpenTelemetry, Datadog, or ELK. Scripting & automation — Python and Bash for tooling, automation, and platform integrations. Strong understanding of networking, security, and cloud cost management in Kubernetes environments. Nice to Have Experience with ML serving infrastructure — KServe, vLLM, Ray Serve, or similar model serving frameworks. Experience with Apache Airflow, Dask, or other data/ML pipeline orchestration at scale. Familiarity with Backstage or similar internal developer platforms (IDP). Istio or Envoy service mesh experience. FinOps tooling — Kubecost, OpenCost, or cloud provider cost management tools. OIDC / identity provider experience (Zitadel, Keycloak, or similar). AWS Certified Solutions Architect or equivalent cloud certification. Exposure to geospatial data workloads or satellite imagery pipelines. Minimum Qualification Bachelor's degree in Computer Science, Information Technology, or a related engineering discipline. Our Stack Kubernetes (EKS / GKE / AKS)  ·  AWS  ·  GCP  ·  Azure  ·  Terraform  ·  Helm  ·  ArgoCD  ·  Istio  ·  KServe  ·  Apache Airflow  ·  Dask  ·  Backstage IDP  ·  Prometheus  ·  Grafana  ·  Loki  ·  Tempo  ·  OpenTelemetry  ·  Kubecost  ·  Python  ·  Bash Why SatSure Real Production Scale: LLM inference, geospatial data pipelines, and multi-cloud Kubernetes — not toy projects. High Ownership: You architect systems end-to-end. No tickets-only culture, no hand-holding required. Meaningful Impact: Your infrastructure powers products used by governments and institutions across the developing world. Growth & Benefits: Learning allowances, broadband, medical insurance, best-in-class leave policy, and hybrid work from Bengaluru.

How to get this job at SatSure

  1. Don't rely on the portal. Cold applications for a role like Senior Cloud/DevOps Engineer (SDE - 3) land in a pile of hundreds. A direct, personalised message to the hiring manager or a referrer is the fastest way in.
  2. Find the right person. ResuMail surfaces the actual recruiters and hiring managers at SatSure — not a generic careers inbox.
  3. Send tailored outreach. ResuMail drafts an email personalised to your resume and this role, then paces and schedules sends so you stay out of spam.
  4. Follow up. One polite nudge after 5–7 days roughly doubles reply rates — scheduled for you.

Reach SatSure's hiring managers today.

Free to start. No credit card. Built for Indian job seekers.

Start free with ResuMail ›