Autonomize AI
is building the intelligence layer for healthcare. Our Genesis platform replaces brittle, manual knowledge workflows with AI agents that reason, retrieve, and act , and reduce administrative burden so clinicians can focus on patients.
We are looking for a
Quality Assurance Engineer
who doesn't just check boxes, they think in agents, design for edge cases, and treat LLM reliability as an engineering discipline.
What This Role Is
You won't just be testing "happy paths" on a UI. You will be validating the brain of the Genesis stack: agentic pipelines, RAG systems, and clinical-facing interfaces.
This is not a role where you just verify CRUD operations. You’ll be responsible for defining how intelligent systems are evaluated, regression-tested, and monitored for accuracy in a high-stakes, regulated industry. You will work directly with key engineering and delivery leaders to ensure our AI agents behave predictably even when the data is unstructured.
You’ll Thrive Here If
AI-Native Quality is your default mode
You understand that "passed" means more than a 200 OK; it means the LLM response was accurate, grounded, and relevant.
You’ve tested production systems where LLMs do real work , and not just simple chatbots.
You can identify hallucinations, drift, and prompt regressions as easily as you spot a UI bug.
You have a basic understanding of RAG pipelines, multi-turn interactions, and how context length affects output quality.
You build robust testing infrastructure
4+ years
of experience in Quality Assurance and software testing.
Deep Python proficiency; you are comfortable using PyTest, Selenium, or similar frameworks to automate the heavy lifting, including developing and
running robust AI evaluation (AI evals) harnesses.
Experience with AI red-teaming methodologies to discover vulnerabilities, safety risks, and adversarial prompt attacks against LLM agents.
Experience validating REST APIs that serve both humans and AI agents (structured outputs, streaming, and tool schemas).
Async-first thinking: You understand how to test event-driven architectures, task queues, and real-time data movement.
You operate at cloud scale
Comfortable with
Docker and Kubernetes
environments , and you know how to pull logs and navigate a containerized stack.
Experience with at least one public cloud (
AWS, Azure, or GCP
).
You’ve been part of a release cycle and know the importance of a clean deploy.
Bonus: You’ve Done This Before
Built evaluation harnesses or used
LangSmith, MLflow
, or similar tools for AI observability.
Experience in the
healthcare domain
(FHIR, HL7, or clinical workflows).
Familiarity with CI/CD pipelines and automating quality gates.
Experience validating data-heavy systems using
Postgres, Elasticsearch
, or graph databases.
What We Value Above Credentials
Bias toward action
– You identify a gap in coverage and fix it. You learn by breaking and building, not just reading docs.
Owner mentality
– You don’t wait for a handoff. You own the quality of the feature from local dev to production.
Intellectual honesty
– You’d rather say "I don’t know why this LLM is hallucinating, let me find out" than ignore an edge case.
Async-first communication
– You write clear bug reports, document test plans, and work well across time zones.
What You’ll Get
VC-backed healthcare AI
company growing fast.
Full-stack ownership
– No silos, no "that’s not my team." You have the authority to stop a release if the quality isn't there.
Direct access to founders
– Your feedback on product behavior will directly influence the roadmap.
Professional development budget
– For certifications (like CKAD), conferences, and books.
Flexible-friendly culture
built around output, not hours.