Role Overview
We are looking for a
Data Engineer
to design, build, and maintain scalable data systems and infrastructure. In this role, you will collaborate with cross-functional teams to gather requirements, develop data pipelines, and implement best practices for data governance, security, and analytics. This position offers the opportunity to shape a core part of our data environment and directly influence how we leverage data for business insights and innovation.
Key Responsibilities
Architect and Implement Data Solutions
Design and build scalable data platforms using [AWS, Azure, or GCP] or on-premises technologies.
Develop best practices for data storage, ingestion, and processing (batch and streaming).
Data Pipeline Development
Create and manage robust ETL/ELT workflows handling various data types (structured, semi-structured, unstructured).
Optimize data pipelines for reliability, scalability, and performance.
Data Governance and Security
Define and enforce governance policies including data retention, cataloging, and lineage tracking.
Ensure compliance with relevant data privacy regulations (GDPR, CCPA, HIPAA, etc.) through appropriate security controls and encryption.
Metadata Management and Cataloging
Implement and maintain data catalog solutions (e.g., AWS Glue, Apache Atlas, Collibra, or Alation).
Automate the detection of new data sets, schema changes, and lineage updates.
Data Quality and Monitoring
Establish automated checks and alerts for data quality, completeness, and consistency.
Troubleshoot and resolve data-related issues, providing root cause analysis and long-term fixes.
Collaboration and Mentorship
Work closely with cross-functional stakeholders to translate requirements into scalable technical solutions.
Qualifications
Education & Experience
Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field; or equivalent experience.
4+ years of hands-on experience in data engineering, data infrastructure, or related fields.
Technical Skills
Proven experience with cloud platforms (AWS, Azure, or GCP) and object storage (S3, ADLS, GCS).
Proficiency in distributed data processing tools like Apache Spark or Databricks.
Strong SQL skills and experience with one or more programming languages (Python, Scala, Java).
Familiarity with real-time data streaming solutions (Kafka, Kinesis, etc.).
Knowledge of data modeling and modern data architecture patterns.
Data Governance & Security
Track record of implementing data catalog and lineage solutions.
Experience with data quality frameworks (e.g., Great Expectations, Deequ) is a plus.
Understanding of RBAC, ABAC, encryption, and other security best practices.
Soft Skills
Excellent communication and stakeholder management skills.
Ability to explain complex technical concepts to both technical and non-technical audiences.
Experience in agile methodologies and cross-functional collaboration.
Nice-to-Have
Certifications in AWS, Azure, or GCP (e.g., AWS Certified Data Analytics – Specialty, Azure Data Engineer).
Experience with infrastructure-as-code (Terraform, CloudFormation).
Familiarity with orchestration tools like Airflow, Luigi, or Dagster.
Experience with containerization and CI/CD for data engineering pipelines.