Troubleshoot and resolve highly technical issues across the Google Cloud AI/ML portfolio, focusing on customer-reported , deployment failures, model performance degradation and infrastructure-related problems.
Work directly with customers on their ML deployments (including Generative AI models)to ensure production readiness,high availability.
Utilize coding and scripting skills (primarily Python) to read,debug, and reproduce customer issues within their ML models (TensorFlow, PyTorch) or deployment environments(Kubernetes, Compute Engine).
Manage customer problems through effective diagnosis,clear documentation and the development/implementation of new investigation tools to increase diagnostic speed.
Develop an in-depth understanding of Google Cloud's AI/ML solutions and share this knowledge to upskill the wider global support organization. Participate in an on-call rotation, may include working non-standard hours,nights,or weekends as part of our global 24/7 support model.
Minimum qualifications:
Bachelor's degree in Computer Science, Engineering, Mathematics, a related technical field, or equivalent practical experience.
5 years of experience in a technical role such as Technical Support, Software Engineering, or Solutions Engineering.
Experience coding in one or more general purpose languages (e.g., Python, Java, Go, C or C++) including data structures, algorithms, and software design.
Experience with Artificial Intelligence (AI) concepts and Machine Learning (ML) techniques.
Experience with computer networking (e.g., TCP/IP, DNS, Load Balancing, routing) and Linux/Unix system administration.
Preferred qualifications:
Professional-level certification on Google Cloud, such as the Professional Machine Learning Engineer or Professional Cloud Architect.
Experience with Google Cloud's AI/ML product portfolio, including Vertex AI (Vertex AI Workbench, Pipelines, Endpoints, TensorBoard) and Generative AI tools (Gemini, Gen AI Studio).
Experience in specialized ML areas like Natural Language Processing (NLP), Computer Vision, or Recommendation System.
Experience with public cloud infrastructure and core services (e.g., Compute Engine, Cloud Storage,BigQuery).
Knowledge of ML frameworks such as TensorFlow, Keras, or PyTorch.
Ability to lead the design and implementation of AI-based solutions or debugging tools, demonstrating strong collaborating skills.