Role Overview
We are looking for a highly skilled
Senior DevOps Engineer
who can design, implement, and scale modern cloud-native infrastructure while integrating
AI-driven automation (AIOps)
into development, deployment, and monitoring workflows.
This role requires deep expertise in
cloud platforms, Kubernetes, CI/CD, and infrastructure as code
, along with the ability to leverage
AI tools to improve operational efficiency, incident management, and developer productivity
.
Key Responsibilities
DevOps & Infrastructure
Design, build, and maintain scalable, secure, and highly available cloud infrastructure (primarily AWS).
Implement and manage containerized environments using
Kubernetes (EKS preferred)
.
Develop and maintain
Infrastructure as Code (IaC)
using tools like Terraform.
Build and optimize
CI/CD pipelines
using GitHub Actions / Bitbucket Pipelines.
Ensure high system reliability, uptime, and performance across environments.
AI-Driven DevOps (AIOps)
Implement
AI-powered monitoring and alerting
for proactive issue detection and resolution.
Leverage AI tools for:
Log analysis and anomaly detection
Predictive incident management
Root cause analysis (RCA) acceleration
Integrate
LLM-based assistants (internal or external)
to:
Automate troubleshooting workflows
Generate deployment scripts / configs
Improve developer productivity
Explore and implement
self-healing systems
using automation + AI insights.
Security & Compliance
Ensure infrastructure security and compliance with best practices.
Implement
SAST/DAST
and vulnerability scanning tools.
Work with
SBOM and CVE tools (e.g., syft, grype)
.
Manage IAM policies, secrets, and secure configurations across environments.
Collaboration & Architecture
Work closely with developers, architects, and product teams to ensure smooth delivery.
Participate in
architecture design and technical decision-making
.
Define and enforce
DevOps best practices, standards, and processes
.
Mentor junior engineers and improve team capabilities.
Operations & Incident Management
Perform
root cause analysis (RCA)
for production issues.
Improve observability using tools like
Prometheus, Grafana, ELK
.
Ensure fast incident response and resolution using automation and AI insights.
Required Skills & Qualifications
Core Technical Skills
Strong expertise in
AWS Cloud
:
EC2, ECS, EKS, IAM, CloudWatch, CloudTrail
EventBridge, SQS, SNS, SES, S3, Lambda, API Gateway, WAF
Deep hands-on experience with
Kubernetes
(must-have)
Strong experience in
Terraform (IaC)
(must-have)
Experience with
CI/CD tools
(GitHub Actions / Bitbucket Pipelines)
Strong knowledge of
Linux systems and networking fundamentals
Programming/scripting experience in
Python / Shell / JavaScript
DevOps & Observability
Experience with
monitoring & observability stacks
:
Prometheus, Grafana, ELK
Experience in
serverless architectures
Knowledge of
high-availability and scalable system design
Security
Experience with:
SAST / DAST tools
Vulnerability management
SBOM tools (syft, grype)
AI / Automation (Mandatory Mindset)
Experience or strong exposure to:
AI-assisted DevOps tools (e.g., Copilot, ChatGPT-based workflows)
Log intelligence and anomaly detection systems
Ability to
identify and implement automation opportunities using AI
Understanding of
AI/ML fundamentals in operations (AIOps)
is a strong plus
Soft Skills
Strong problem-solving and analytical thinking
Excellent communication and stakeholder management
Process-oriented with strong documentation practices
Ability to work in a fast-paced, multi-project environment
Good to Have
AWS / Kubernetes / Terraform certifications
Experience in
platform engineering or internal developer platforms
Exposure to
multi-cloud environments
Experience building
DevOps frameworks for enterprise clients
What We Expect (Mako Context Fit)
Ability to handle
multiple client environments simultaneously
Strong ownership mindset for
delivery + stability
Capability to
drive DevOps maturity across teams
Comfortable working in
high-pressure, client-facing scenarios