CloudOps Engineer
Job Description
Role Summary
The
CloudOps Engineer
is responsible for the
day-to-day operations, reliability, performance, and cost optimization
of cloud environments across infrastructure, applications, and data platforms. This role ensures secure, scalable, and highly available cloud operations by implementing automation, monitoring, and governance aligned with organizational standards.
Key Responsibilities
Cloud Operations & Reliability
Manage and monitor
cloud infrastructure, applications, and platform services
to ensure high availability and performance
Implement
incident management, root cause analysis (RCA), and problem resolution
Ensure uptime, reliability, and performance using
SRE practices (SLI/SLO/SLA)
Handle
on-call support and production issues
Platform & Environment Management
Operate and maintain
cloud environments (dev/test/stage/prod)
Manage
subscriptions/accounts, RBAC, IAM roles, and access controls
Maintain
network configurations, VMs, containers, storage, and platform services
Support
deployment pipelines and environment provisioning
Automation & DevOps Enablement
Develop and maintain
Infrastructure-as-Code (Terraform/Bicep/CloudFormation)
Automate
deployment, scaling, patching, and configuration management
Support CI/CD pipelines and ensure
smooth release management
Implement
auto-scaling and self-healing mechanisms
Monitoring, Logging & Observability
Implement and manage
monitoring tools, alerts, dashboards, and logging frameworks
Ensure proactive detection of issues using
metrics, logs, and traces
Optimize system performance and reduce downtime
Security, Compliance & Governance
Enforce
security best practices (IAM, encryption, network security)
Ensure compliance with
organizational policies and regulatory requirements
Implement
backup, disaster recovery (DR), and business continuity (BCP)
Monitor
cost usage, tagging, and budget controls
Migration & Support
Support cloud migration activities (rehost, replatform) from an
operations perspective
Validate deployment readiness, rollback strategies, and runbooks
Ensure
smooth transition to production and post-go-live support
Collaboration & Continuous Improvement
Work closely with
DevOps, Developers, Architects, and Security teams
Improve operational efficiency through
automation and optimization
Document
runbooks, SOPs, and operational procedures
Required Experience
5–10 years in
Cloud Operations / DevOps / SRE roles
Hands-on experience with
Azure / AWS / GCP cloud platforms
Strong knowledge of:
Infrastructure-as-Code (Terraform/Bicep/CloudFormation)
CI/CD tools (Azure DevOps, Jenkins, GitHub Actions)
Monitoring tools (CloudWatch, Azure Monitor, Prometheus, Grafana)
Experience with:
Containers (Docker, Kubernetes)
Networking, IAM, and security best practices
Familiarity with
incident management, DR/BCP, and cost optimization