About the role
We are looking for a Senior AI/ML Engineer who can take AI features from prototype to production with
confidence. You will own the full lifecycle of our LLM-powered systems — from benchmarking model and
pipeline performance, to hardening the stack for scale, to shipping it live to real users. This role sits at the
intersection of applied LLM/GenAI work and MLOps, and is critical to how quickly and reliably we can put new AI
capabilities in front of customers.
You will work closely with product, backend, and design to make sure what we ship is fast, accurate, cost-
efficient, and observable in production.
What you will do
• Design, build, and ship LLM-powered features end-to-end — including RAG pipelines, agentic workflows,
prompt orchestration, and fine-tuning where it makes sense.
• Define and run benchmarking frameworks for our AI applications: latency, throughput, accuracy,
hallucination rate, cost per request, and quality regressions across model and prompt changes.
• Establish offline evals (golden sets, LLM-as-judge, human-in-the-loop) and online evals (A/B tests,
shadow traffic, canary releases) before any model or prompt goes live.
• Take models and pipelines to production: containerize, deploy, autoscale, and monitor inference
services with clear SLOs for latency, error rate, and cost.
• Build the MLOps backbone — CI/CD for models and prompts, versioning, feature stores where needed,
observability (traces, metrics, logs), and rollback paths.
• Optimize inference performance and cost: batching, caching, quantization, distillation, model routing,
and choosing the right managed vs. self-hosted trade-offs.
• Partner with product to translate fuzzy product asks into measurable AI quality bars, and own the “is
this good enough to ship?” decision with data behind it.
• Mentor other engineers on LLM best practices, eval rigor, and production readiness.
What we are looking for
• 3–6 years of engineering experience, with a meaningful portion spent shipping ML or AI systems to
production (not just notebooks or POCs).
• Strong hands-on experience with LLMs and GenAI: at least one production system using OpenAI /
Anthropic / open-source models, plus practical experience with RAG, embeddings, vector stores, and
prompt engineering.
• Solid MLOps foundation — model serving (FastAPI, vLLM, Triton, SageMaker, or similar),
containerization (Docker, Kubernetes), and at least one cloud (AWS, GCP, or Azure).
• Demonstrated ability to benchmark systems rigorously: you can talk concretely about how you
measured a model’s quality and performance, what you optimized, and what you knowingly traded off.
• Strong Python skills; comfortable with PyTorch or TensorFlow, and with frameworks like LangChain,
LlamaIndex, or equivalent (or a clear point of view on why not to use them).
• Good engineering discipline: testing, code review, clear API design, and the instinct to add observability
before it is needed.
• Comfortable owning the path to production — you have taken something live, watched it break, and
fixed it, and you do not need a separate team to do that for you.
Bonus points
• Experience fine-tuning or post-training open-source models (LoRA/QLoRA, DPO, RLHF).
• Worked with multimodal models (image, video, or audio generation/understanding).
• Built or contributed to an internal eval harness or LLM observability tooling.
• Experience with high-QPS, low-latency inference at consumer scale.
• Open-source contributions or technical writing in the AI/ML space.
What success looks like in your first 6 months
• You have shipped at least one customer-facing AI feature to production, fully owned by you.
• A benchmarking and evals framework is in place, run on every model or prompt change, with results
visible to the team.
• Production AI services have clear SLOs, dashboards, and alerting — and you can answer “what is this
costing us per 1,000 requests?” without thinking.
• The team’s velocity on shipping AI features has measurably increased because of the infrastructure and
patterns you put in place.
Why Zocket
Zocket is building the AI layer for marketing — letting any business create high-performing ads, creatives, and
campaigns in minutes instead of weeks. AI is not a side project here; it is the product. You will work on systems
that thousands of businesses use every day, with a tight team, fast feedback loops, and a real mandate to ship.
How to apply
Send your resume and links to anything you have shipped (GitHub, projects, papers, demos) to
careers@zocket.com. If you have taken an AI system from prototype to production and have a story about how
you knew it was ready, lead with that.