We are looking for a
Senior Distributed Systems Engineer
to design and build a scalable, event-driven platform that coordinates distributed workloads across multiple environments. This role involves developing a
central orchestration engine and a lightweight execution agent framework
, enabling reliable job distribution, scheduling, and execution. The system is powered by
Apache Kafka
, and you will design robust messaging pipelines, manage workflow orchestration, handle versioned artifacts, and ensure consistency across distributed components. You will work on challenges such as dependency resolution, parallel execution, state management, and safe rollout of updates, while maintaining high performance and system reliability. Strong programming expertise in
Python
is required for building orchestration logic, agents, and integration layers.
The ideal candidate has deep expertise in
Apache Kafka
, including topic design, partitioning strategies, consumer group scaling, and message delivery guarantees. You should have strong experience building
fault-tolerant distributed systems
, handling failures such as network disruptions, partial execution states, retries, and idempotency. Hands-on experience with
Python-based backend systems
, along with familiarity in
containerized environments (Docker, Kubernetes)
and event-driven architectures, is essential. You should be comfortable designing secure, efficient agents that operate across network boundaries and think in terms of systems, reliability, observability, and scalability from day one.