Role: Staff MLOps Engineer (Manager)
Experience:
10 Years+
Location:
Hyderabad, India
About the Role
We are looking for a
Staff MLOps Engineer (Manager)
to lead the design, build, and scale of enterprise-grade MLOps and DevOps platforms. This role combines
hands-on engineering excellence with team leadership
, focusing on productionizing machine learning systems, enabling developer productivity, and driving automation at scale.
You will work at the intersection of
ML engineering, cloud infrastructure, and platform engineering
, helping teams deliver reliable, scalable, and secure ML solutions in production.
Key Responsibilities
Lead and mentor a team of MLOps/DevOps engineers, driving technical excellence and delivery outcomes
Architect, build, and scale
end-to-end MLOps platforms and CI/CD pipelines
for ML workloads
Design and implement
automated deployment pipelines
for training, testing, and model serving at scale
Operationalize ML models into production with a focus on
performance, reliability, and observability
Partner with data scientists and engineering teams to enable
self-service ML platforms and developer tooling
Implement
Infrastructure-as-Code (IaC)
and automation frameworks for cloud environments
Ensure platform compliance with
security, governance, and reliability standards
Troubleshoot complex production issues and continuously improve
developer experience and system resilience
Drive best practices for
CI/CD, testing, monitoring, and release management
across ML and data platforms
Evaluate and optimize environments supporting large-scale data pipelines and ML workflows
Required Qualifications
10+ years of experience in
DevOps, MLOps, or Platform Engineering
roles
5+ years of
people management experience
, leading teams of 5+ engineers
Strong hands-on expertise in building and scaling
MLOps pipelines and platforms
Proven experience with
Infrastructure-as-Code (Terraform preferred)
in public cloud environments
Deep experience with
CI/CD tools
such as GitHub Actions, Jenkins, and code quality/security tools (e.g., Snyk)
Strong knowledge of
MLOps and orchestration frameworks
such as Airflow, Kubeflow, MLflow, or similar
Experience deploying and managing
ML models in production at scale
Hands-on experience with
distributed data processing frameworks
such as Apache Spark, EMR, or Databricks
Strong programming skills in
Python (preferred)
or Node.js/Bash
Experience with
containerization and orchestration
(Docker, Kubernetes)
Strong understanding of
cloud platforms
(AWS, Azure, or GCP) and cloud-native services
Experience with
data platforms and services
such as Snowflake, Redshift, Glue, BigQuery, or similar
Solid understanding of
distributed systems, monitoring, logging, and reliability engineering
Experience with
Git-based workflows and version control best practices
Preferred Qualifications
Experience with
configuration management tools
(Ansible, Chef, Puppet)
Familiarity with
ML libraries and frameworks
such as scikit-learn, PyTorch, TensorFlow
Exposure to
large-scale inference systems and batch/real-time scoring architectures
Experience supporting
multi-runtime environments
(Node.js, Java/Spark/Scala, React)
Cloud certifications (AWS/GCP/Azure)
What We’re Looking For
Strong problem-solving mindset with a passion for
automation and scalability
Ability to balance
hands-on engineering with team leadership
Focus on
developer experience, platform reliability, and operational excellence
Excellent collaboration skills across
data, engineering, and architecture teams