Job Title:
Data Engineer
Location:
Pune
Job Type:
Full-Time ( WFO )
About TVARIT
TVARIT GmbH specializes in developing and delivering cutting-edge artificial intelligence (AI) solutions for the metal industry, including steel, aluminum, copper, cast iron, and more. Our software products empower customers to make intelligent, data-driven decisions, driving advancements in
Predictive Quality (PsQ)
,
Predictive Maintenance (PdM)
, and
Energy Consumption Reduction (PsE)
, etc.
With a strong portfolio of renowned reference customers, state-of-the-art technology, a talented research team from prestigious universities, and recognition through esteemed awards such as the
EU Horizon 2020 AI Prize
, TVARIT is recognized as one of the most innovative AI companies in Germany and Europe.
We are seeking a
self-motivated individual
with a positive "can-do" attitude and excellent oral and written communication skills in English to join our team.
Job Description
We are looking for a
Data Engineer
with strong expertise in
Azure Databricks
,
PySpark
, and
distributed computing
to develop and optimize scalable
ETL pipelines
for manufacturing analytics. The role involves working with high-frequency industrial data to enable
real-time and batch data processing
.
Key Responsibilities
Build scalable real-time and batch processing workflows using
Azure Databricks
,
PySpark
, and
Apache Spark
.
Perform data pre-processing, including
cleaning, transformation, deduplication, normalization, encoding, and scaling
to ensure high-quality input for downstream analytics.
Design and maintain cloud-based
data architectures
, including
data lakes
,
lakehouses
, and
warehouses
, following
Medallion Architecture
.
Deploy and optimize data solutions on
Azure (preferred)
,
AWS
, or
GCP
, with a focus on
performance, security, and scalability
.
Develop and optimize
ETL/ELT pipelines
for structured and unstructured data from
IoT, MES, SCADA, LIMS, and ERP systems
.
Automate data workflows using
CI/CD
and
DevOps best practices
, ensuring security and compliance with industry standards.
Monitor, troubleshoot, and enhance data pipelines for
high availability and reliability
.
Utilize
Docker
and
Kubernetes
for scalable data processing.
Collaborate with the automation team, data scientists, and engineers to provide
clean, structured data
for AI/ML models.
Desired Skills and Qualifications
Bachelor’s or Master’s degree in
Computer Science
,
Information Technology
, or a related field.
Minimum 2 years of experience
in
data engineering
, with a strong focus on cloud platforms such as
Azure (preferred)
,
AWS
, or
GCP
.
Proficiency in
PySpark
,
Azure Databricks
,
Python
, and
Apache Spark
.
Expertise in
relational databases
(e.g., SQL Server, PostgreSQL),
time-series databases
(e.g., InfluxDB), and
NoSQL databases
(e.g., MongoDB, Cassandra).
Experience in
containerization
(Docker, Kubernetes).
Strong analytical and problem-solving skills with attention to detail.
Good to have knowledge of
MLOps
,
DevOps
, and
model lifecycle management
.
Excellent communication and collaboration skills, with a proven ability to work effectively as a team player.
Comfortable working in a
dynamic, fast-paced startup environment
, adapting quickly to changing priorities and responsibilities.