Provide technical leadership on high-impact projects. Develop an understanding of various parts of Google’s ML training and serving stack.
Influence and coach a distributed team of engineers.
Identify opportunities to improve the efficiency of the ML fleet and build solutions and capabilities to improve ML fleet efficiency. Facilitate alignment and clarity across teams on goals, outcomes, and timelines.
Build/Expand ML infrastructure and platform to support future needs of internal and external Google Cloud Platform (GCP) customers.
Drive collaboration with various teams (within CoreML organization, and across different product areas) as needed to build solutions with other components of ML infra, and accomplish the efficiency improvement goals.
Minimum qualifications:
Bachelor's degree or equivalent practical experience.
8 years of experience programming in C++.
5 years of experience testing, and launching software products.
5 years of experience building and developing large-scale infrastructure, distributed systems or networks, or experience with compute technologies, storage, or hardware architecture.
3 years of experience with software design and architecture.
Preferred qualifications:
Master’s degree or PhD in Engineering, Computer Science, or a related technical field.
8 years of experience with data structures and algorithms.
3 years of experience working in a complex, matrixed organization involving cross-functional, or cross-business projects.
Ability to learn and apply concepts in Large Language Model (LLM) training and serving, ML frameworks, and TPU/GPU architecture.
Passion for building infrastructure to increase the velocity of ML development.