Career Area:
Technology, Digital and Data
Job Description:
Your Work Shapes the World at Caterpillar Inc.
When you join Caterpillar, you're joining a global team who cares not just about the work we do – but also about each other. We are the makers, problem solvers, and future world builders who are creating stronger, more sustainable communities. We don't just talk about progress and innovation here – we make it happen, with our customers, where we work and live. Together, we are building a better world, so we can all enjoy living in it.
Senior Engineering Manager – Data, ML, DevOps & AI Ops
About the Role
We are seeking a
Senior Engineering Manager
to lead our
Data, ML, DevOps, and AI Ops
capabilities, driving the design, development, deployment, and intelligent operation of enterprise-scale data platforms, machine learning systems, and cloud-native infrastructure.
This role is accountable for
operationalizing data and AI at scale
—ensuring reliability, performance, security, and continuous optimization across data pipelines, ML platforms, application infrastructure, and production environments. You will enable advanced analytics, AI-driven applications, and digital transformation initiatives by embedding
automation, observability, and AI-powered operations
into the core engineering ecosystem.
You will lead a multidisciplinary organization spanning
Data Engineering, ML Engineering, Platform Engineering, DevOps, and AI Ops
, and play a critical role in enabling
real-time insights, predictive intelligence, resilient platforms, and intelligent automation
across the enterprise.
Key Responsibilities
Leadership & Strategy
Provide strategic direction and technical leadership across
Data Ops, ML Ops, DevOps, and AI Ops
, fostering a culture of engineering excellence, automation, and operational rigor.
Define and execute the
end-to-end platform strategy
spanning data pipelines, ML lifecycle, CI/CD, infrastructure, and intelligent operations.
Partner with executive leadership on
technology roadmaps, platform modernization, vendor strategy, and emerging capabilities
in AI, DevOps, and cloud platforms.
Data, ML & Platform Engineering
Architect and scale
cloud-native data platforms
supporting real-time and batch ingestion, transformation, analytics, and AI workloads.
Drive
ML Ops best practices
for model training, deployment, monitoring, retraining, and governance across the full model lifecycle.
Ensure seamless integration of
data platforms, ML services, and application ecosystems
.
DevOps & Platform Reliability
Establish and mature
DevOps practices
, including CI/CD pipelines, infrastructure-as-code, automated testing, and release management for data, ML, and application platforms.
Ensure
high availability, performance, scalability, and cost efficiency
across cloud infrastructure and platform services.
Embed
SRE principles
, SLIs/SLOs, and resilience engineering into platform operations.
AI Ops & Intelligent Operations
Lead the adoption of
AI Ops capabilities
for proactive monitoring, anomaly detection, incident correlation, root cause analysis, and predictive remediation.
Integrate observability signals (logs, metrics, traces, events) across data, ML, and application platforms to enable
intelligent, self-healing systems
.
Drive automation to reduce manual operational overhead and improve MTTR, reliability, and platform insights.
Governance, Security & Compliance
Establish enterprise standards for
data governance, model governance, security, privacy, and compliance
across platforms.
Ensure platforms meet enterprise, regulatory, and cybersecurity requirements by design.
Collaboration & Talent
Collaborate with data scientists, product teams, architects, and business stakeholders to translate
AI and platform strategies into production-ready solutions
.
Lead
talent development, hiring, and organizational design
, building a high-performing, globally scalable engineering organization.
Required Qualifications
Bachelor’s or Master’s degree in Computer Science, Engineering, or related field.
15+ years
of experience in software, data, or platform engineering, with
5+ years
in senior engineering leadership roles.
Strong expertise across
Data Engineering, ML Ops, DevOps, and production platform operations
.
Hands-on experience with
cloud platforms
(AWS, Azure, or GCP) and
container orchestration
(Docker, Kubernetes).
Proven experience with
CI/CD pipelines, infrastructure-as-code (Terraform, ARM, CloudFormation), and automation frameworks
.
Solid understanding of
streaming and data platforms
(Kafka, Spark, Flink) and
ML Ops tooling
(MLflow, Kubeflow, SageMaker).
Experience driving
platform reliability, security, governance, and compliance
at enterprise scale.
Strong leadership, communication, and stakeholder management skills.
Preferred Qualifications
Experience with
AI Ops platforms
, intelligent observability, and incident automation.
Exposure to
feature stores, model registries, real-time inference
, and event-driven architectures.
Knowledge of
SRE practices
, error budgets, and resilience engineering.
Familiarity with
GPU acceleration, distributed training, and high-performance computing
.
Experience with observability stacks (Prometheus, Grafana, OpenTelemetry) and log analytics platforms.
Contributions to open-source projects or published work in data platforms, ML Ops, DevOps, or AI Ops.
Why Join Us?
Lead
enterprise-critical platforms
at the intersection of
Data, AI, DevOps, and Intelligent Operations
.
Shape how AI is
built, deployed, and operated at scale
, not just experimented with.
Influence platform strategy and engineering culture across a global organization.
Competitive compensation, flexible work options, and strong career growth opportunities.
Posting Dates:
May 14, 2026 - May 27, 2026
Caterpillar is an Equal Opportunity Employer. Qualified applicants of any age are encouraged to apply
Not ready to apply? Join our
Talent Community
.