resu·mail

Research Data Engineer

at Smallest

Bengaluru, India Mid Posted 2026-04-07

Don't apply into the void — reach the hiring manager

ResuMail finds the recruiters and hiring managers behind this Research Data Engineer role at Smallest, drafts a personalised outreach email, and schedules the send — so your application actually gets seen.

Reach the hiring manager ›

About this role

Research Data Engineer (India) — Smallest.ai About the Role This is not a typical data engineering role. You won’t be building dashboards. You won’t be maintaining pipelines no one touches. You will take messy, noisy, real-world data — and turn it into something models can learn from. Think of this as running a gold mine - you take dust and convert it to gold. We work on speech, language, and real-time systems across 50+ languages. The difference between a good model and a great one is almost always data quality + data systems. That’s where you come in. What You’ll Work On Data Pipelines (Real-time + Batch) Build high-throughput pipelines for audio, text, and multimodal data Streaming + offline processing at scale Data Quality & Curation Cleaning, filtering, deduplication, normalization (numbers, emails, code-mix, etc.) Designing heuristics + ML-based data filtering systems Multilingual Data Systems Handling 50+ languages, accents, and code-mixed inputs Language-aware normalization and segmentation Training Data Engine Build pipelines that continuously generate better training data from production Active learning loops, data selection, sampling strategies Evaluation & Benchmarking Pipelines Create scalable eval datasets across languages and domains Automate quality tracking for ASR, TTS, and conversational systems Data Infra for Research Work closely with research team to unblock experiments fast Build systems that reduce iteration time from weeks → hours What This Role Is NOT Not a dashboard/reporting role Not a “move data from A to B” role Not a maintenance-heavy legacy pipeline role What We’re Looking For Strong fundamentals in data structures, systems, and pipelines Experience with large-scale data processing (audio/text preferred) Comfortable with messy, unstructured, real-world data Strong coding skills (Python required; systems experience is a plus) Understanding of ML/data pipelines (training, eval, data curation) Bonus (Not Mandatory) Experience with speech/audio data (ASR/TTS) Familiarity with multilingual datasets Experience with streaming systems (Kafka, etc.) Exposure to data-centric AI / data quality frameworks How We Work Speed over perfection Production over papers Systems that scale, not scripts that barely work Tight loop between data → model → eval → improvement Who This Is For You enjoy working with raw, chaotic data You care about data quality more than tooling hype You like building systems that directly impact model performance You get excited by turning unusable data into competitive advantage Why Join Us We’re building real-time, multilingual voice AI systems. At this level, models are only as good as the data behind them . If you want to work on the layer that actually moves the needle - this is it.

How to get this job at Smallest

  1. Don't rely on the portal. Cold applications for a role like Research Data Engineer land in a pile of hundreds. A direct, personalised message to the hiring manager or a referrer is the fastest way in.
  2. Find the right person. ResuMail surfaces the actual recruiters and hiring managers at Smallest — not a generic careers inbox.
  3. Send tailored outreach. ResuMail drafts an email personalised to your resume and this role, then paces and schedules sends so you stay out of spam.
  4. Follow up. One polite nudge after 5–7 days roughly doubles reply rates — scheduled for you.

Reach Smallest's hiring managers today.

Free to start. No credit card. Built for Indian job seekers.

Start free with ResuMail ›