Role:
Senior Azure Data Engineer
Location:
India, Remote
Experience:
6-8 Years
Algoworks
www.algoworks.com
About the Company
Algoworks is an award-winning artificial intelligence, engineering services and experience transformation firm with offices across the United States, Europe, South America and India. We bring together a global team of engineers, architects, designers, researchers and operators united by rigor, accountability and a commitment to delivering measurable results.
For over 20 years, Algoworks has partnered with Fortune 500 organizations across the Americas, Europe and Asia to define, build and run technology that drives meaningful business outcomes. Our work combines human-centered design, engineering excellence and AI-powered capabilities to solve complex challenges with clarity and precision. Innovation, particularly in the responsible application of AI, is embedded in how teams approach problem-solving and continuous improvement.
At Algoworks, growth is continuous and closely tied to impact. Teams collaborate across geographies and disciplines, strengthening outcomes through shared insight and collective expertise. The culture values transparency, open dialogue and an environment where every voice is heard and contribution is recognized.
Through collaboration, accountability and a focus on results, Algoworks operates at the intersection of technology and people, building not only advanced systems but strong global teams that elevate performance and create lasting impact.
Follow the video below to know about us!
Clipchamp
Role overview
We are seeking a Senior Azure Data Engineer to design, build, and optimize scalable data pipelines using Azure cloud technologies.
This role focuses on developing robust data ingestion and transformation pipelines, implementing Delta Lake-based data architectures, and enabling high-quality curated datasets for downstream analytics and reporting. The ideal candidate will have strong expertise in PySpark, Azure Databricks, and Azure Data Factory, along with a deep understanding of data performance optimization and engineering best practices.
Key responsibilities:
1.Pipeline development
Build and maintain scalable data pipelines using Azure Databricks and Azure Data Factory.
Implement ingestion and transformation logic across Bronze (raw) and Silver (cleaned) layers.
Support batch and incremental data processing patterns.
2.Curated Layer and data processing
Implement hydration, merge, and upsert logic using Delta Lake.
Ensure curated datasets meet business requirements and data quality standards.
Handle late-arriving data and incremental updates efficiently.
3.Performance and storage optimization
Optimize Delta Lake tables for performance and cost efficiency.
Select and tune appropriate storage formats (Parquet, Delta).
Apply partitioning, compaction, and file sizing strategies.
Tune Spark jobs for large-scale distributed data processing.
4.Downstream collaboration and data enablement
Collaborate with DWH and BI teams to support downstream data consumption.
Provide optimized datasets for Synapse and reporting workloads.
Support data validation, reconciliation, and consistency across Gold layer outputs.
5.Engineering best practices
Implement CI/CD practices for data pipelines and workflows.
Follow coding standards, documentation, and version control practices.
Support production troubleshooting, monitoring, and performance tuning.
Required skills and qualifications:
Bachelor’s degree in computer science, Engineering, or related field.
6–8 years of experience in data engineering.
Strong expertise in:
PySpark and distributed data processing.
Azure Databricks (hands-on development and optimization).
Azure Data Factory for pipeline orchestration.
Deep knowledge of Delta Lake (merge, upsert, optimization techniques).
Strong SQL skills for data transformation and validation.
Experience handling large datasets in distributed environments.
Strong understanding of storage optimization (Parquet, Delta).
Tools and practices
Experience with Git and version control systems.
Familiarity with CI/CD pipelines for data workflows.
Understanding of data quality checks and validation techniques.
Experience working in Agile/Scrum delivery models.
Nice to have skills:
Experience supporting Synapse Dedicated SQL Pool.
Exposure to streaming or near real-time data pipelines.
Familiarity with data governance or metadata management tools.
Must have skills:
6–8 years of data engineering experience.
Strong expertise in PySpark, Azure Databricks, Azure Data Factory, and Delta Lake.
Strong SQL skills and experience with large-scale distributed data processing.
Experience with CI/CD, Git, data quality validation, and performance optimization.
Soft skills and collaboration:
Strong analytical and problem-solving skills.
Ability to work independently on complex data pipelines.
Good communication and collaboration skills.
Proactive and ownership-driven mindset.
Desired attributes:
Strong attention to data quality and performance.
Continuous learning mindset for evolving cloud/data technologies.
Ability to work in fast-paced, data-intensive environments.
Interview process
2 rounds of discussion.