Data Engineer – Job Description
Role Summary
The
Data Engineer
is responsible for designing, building, and maintaining scalable
data pipelines, data platforms, and integration solutions
across cloud environments. This role focuses on transforming raw data into reliable, high-quality datasets to support analytics, reporting, and AI/ML use cases.
Key Responsibilities
Data Engineering & Pipeline Development
Design, develop, and maintain
ETL/ELT pipelines
for data ingestion, transformation, and loading
Build scalable data workflows using
batch and real-time processing frameworks
Develop and optimize
data pipelines for performance, reliability, and scalability
Handle structured and unstructured data across multiple sources
Data Platform & Cloud Implementation
Work with cloud platforms (
Azure / AWS / GCP
) to build and manage data solutions
Utilize cloud-native services such as:
Data lakes, warehouses, and lakehouse platforms
Distributed compute (e.g., Spark, Databricks, Synapse)
Support deployment and management of
data infrastructure and storage systems
Data Integration & Transformation
Integrate data from multiple systems including
APIs, databases, applications, and streaming sources
Implement transformation logic using
SQL, PySpark, or other data processing tools
Ensure consistency and accuracy across data pipelines
Data Quality, Governance & Security
Implement data validation, cleansing, and quality checks
Ensure compliance with
data governance, privacy, and security policies (PII/PHI handling)
Maintain data lineage, metadata, and documentation
Monitoring, Optimization & Reliability
Monitor data pipelines and workflows for failures and performance issues
Implement logging, alerting, and troubleshooting mechanisms
Optimize pipelines for
cost, speed, and resource utilization
Collaboration & Support
Work closely with
data architects, analysts, and business stakeholders
to understand requirements
Support analytics, BI, and AI teams with
clean and reliable datasets
Participate in
code reviews, testing, and deployment processes
Documentation & Best Practices
Document data flows, pipeline logic, and technical designs
Follow best practices for
data modeling, schema design, and version control
Maintain reusable components and frameworks
Required Experience
3–8 years of experience in
Data Engineering or related roles
Strong experience with:
ETL/ELT tools and frameworks
SQL and data modeling concepts
Python / PySpark / Scala (at least one)
Hands-on experience with:
Cloud platforms (Azure / AWS / GCP)
Big data tools (Spark, Databricks, Synapse, etc.)
Experience with
data streaming tools (Kafka/Event Hubs) is a plus
Understanding of
CI/CD and DevOps practices