Senior Researcher - Efficient AI

Bengaluru, India Senior Posted 2026-05-03

Don't apply into the void — reach the hiring manager

ResuMail finds the recruiters and hiring managers behind this Senior Researcher - Efficient AI role at Microsoft, drafts a personalised outreach email, and schedules the send — so your application actually gets seen.

Reach the hiring manager ›

About this role

Delivering these AI experiences at scale requires solving some of the hardest efficiency challenges in modern AI systems. Formulate, develop, and evaluate new algorithmic and system-level approaches for end-to-end AI serving, using analytical modeling and large-scale measurement to study token-level latency, tail latency (p95/p99), throughput-per-dollar, cold-start behavior, warm pool strategies, and capacity planning under multi-tenant SLOs and variable sequence lengths. Design and experimentally evaluate endpoint configuration and execution policies, including batching, routing, and scheduling strategies, tensor and pipeline parallelism, quantization and precision profiles, speculative decoding, and chunked or streaming generation, and drive the most promising approaches through robust rollout and validation into production. Perform hardware- and kernel-aware optimization by collaborating closely with model, kernel, compiler, and hardware teams to align serving algorithms with attention/KV innovations and accelerator capabilities. Build and benchmark experimental prototypes and large-scale measurements to validate research ideas and drive them toward production readiness; produce clear technical documentation, design reviews, and operational playbooks. Publish research results, file patents, and, where appropriate, contribute to open-source systems and serving frameworks. Doctorate in relevant field OR Master's Degree in relevant field AND 3+ years related research experience. OR Bachelor's Degree in relevant field AND 4+ years related research experience. OR equivalent experience. Demonstrated expertise in areas of algorithmic optimization, parallel computing, queuing and scheduling theory, and practical request orchestration under strict SLO constraints. Strong understanding of GPU architecture and memory hierarchies. Proficiency in C++ and Python for high-performance systems, with strong code quality and profiling/debugging skills. Proven record of research impact through publications and/or patents, and experience carrying ideas through to systems that operate at scale in real production environments. Deep understanding of transformer inference efficiency techniques such as sharding strategies, attention optimizations, paged KV caches, speculative decoding, LoRA, sequence packing or continuous batching, and quantization. 3+ years of experience with machine learning frameworks (e.g., PyTorch, TensorFlow) and inference serving frameworks (e.g., vLLM, Triton Inference Server, TensorRT-LLM, ONNX Runtime, Ray Serve, DeepSpeed-MII). 3+ years of experience in GPU programming and optimization, with expert knowledge of CUDA, ROCm, Triton, PTX, CUTLASS, or similar GPU programming frameworks. Background in cost and performance modeling, autoscaling, and multi-region deployment or disaster recovery.

How to get this job at Microsoft

Don't rely on the portal. Cold applications for a role like Senior Researcher - Efficient AI land in a pile of hundreds. A direct, personalised message to the hiring manager or a referrer is the fastest way in.
Find the right person. ResuMail surfaces the actual recruiters and hiring managers at Microsoft — not a generic careers inbox.
Send tailored outreach. ResuMail drafts an email personalised to your resume and this role, then paces and schedules sends so you stay out of spam.
Follow up. One polite nudge after 5–7 days roughly doubles reply rates — scheduled for you.

Reach Microsoft's hiring managers today.

Free to start. No credit card. Built for Indian job seekers.

Start free with ResuMail ›