Role Summary
We are hiring an AI DevOps / LLMOps Engineer to build and operate cloud-native AI platforms and pipelines for production-grade, AI-powered applications.
This is a hands-on, ops-focused role requiring deep expertise in Cloud, DevOps, and MLOps/LLMOps, along with the ability to apply AI tools in day-to-day engineering workflows.
Must-Have Skills (Non-Negotiable)
Cloud & Infrastructure
Strong hands-on experience with AWS / Azure / GCP
Expertise in Infrastructure-as-Code (Terraform preferred)
Experience provisioning and managing production environments
MLOps / LLMOps
Experience building and operating ML/LLM pipelines
Knowledge of:
CI/CD for AI workloads
Model & prompt versioning
Monitoring, drift detection, retraining
CI/CD & Platform Engineering
Experience designing CI/CD pipelines
Automation of build → deploy → monitor workflows
Containers & Orchestration
Hands-on with Docker and Kubernetes
Managing scalable, distributed workloads
AI Platform Operations
Experience supporting:
LLM-based applications (RAG, APIs, pipelines)
Vector databases and inference systems
Managing compute, scaling, and performance
AI-Native Engineering (Critical)
Actively uses AI tools (LLMs, copilots, agents) in daily engineering work
Experience working with local/open-source LLMs
Applies AI to improve automation, debugging, and operational efficiency
Core Responsibilities
Provision and manage cloud infrastructure for AI applications
Build and maintain MLOps / LLMOps pipelines
Deploy and operate AI-powered applications in production
Implement monitoring, logging, and observability
Optimize cost, performance, and resource usage
Ensure reliability, scalability, and security
Collaborate with engineering teams to productionize AI systems
Required Experience
5–10+ years in DevOps / Cloud / Platform Engineering
2–4+ years in MLOps / LLMOps / AI platforms
Strong experience with:
Python / scripting
CI/CD tools
Infrastructure-as-Code
Preferred
Experience with vector databases
Exposure to LLM ecosystems
Familiarity with microservices / event-driven systems
Knowledge of AI governance and security