The Opportunity
We’re looking for a Senior Data Engineer to architect and scale the data backbone powering next-generation AI models in robotics and real-world environments.
This role sits at the intersection of distributed systems, multimodal data processing, and applied machine learning, with a strong focus on building high-quality datasets for robotic foundation models. You will ensure that data pipelines, infrastructure, and data strategy directly translate into measurable improvements in model performance.
Your Responsibilities
-
Drive the model–data loop by connecting application requirements with data collection, and translating model failures into data-driven improvements through collection, curation, and augmentation
-
Build and scale distributed data pipelines (Ray/Anyscale or similar) for TB-scale video, sensor, and robotics datasets
-
Design multimodal data schemas aligning video, actions, and high-frequency sensor streams
-
Develop Python tooling for data quality, including cleaning, anomaly detection, and dataset versioning
-
Own dataset quality and coverage, including annotation workflows, data diversity, and storage trade-offs
-
Lead a small team and coordinate with data providers and annotation vendors
-
Oversee real-world data collection, including technical setup, compliance, and secure data handling
Technologies
-
Python (advanced, production-grade)
-
Ray / Anyscale or Apache Spark
-
AWS / GCP for large-scale data and GPU training pipelines
-
Video and sensor data formats (H.264/H.265, ROS bags, MCAP)
-
PyTorch, NumPy
-
DVC, LakeFS or similar data versioning tools
-
Distributed data processing and storage systems
Requirements
Must Have
-
5+ years in Data/ML Engineering, including 2+ years in a senior or lead role
-
Experience with large-scale real-world data (robotics, autonomous systems, or video AI)
-
Strong experience with Ray/Anyscale or Spark for distributed pipelines
-
Advanced Python (performance, concurrency, ML stack like NumPy/PyTorch)
-
Experience working with video and sensor data formats (e.g., H.264/H.265, ROS bags, MCAP)
-
Experience building scalable data pipelines for GPU-based training workloads (AWS/GCP)
-
Experience with data versioning tools such as DVC or LakeFS
-
Proven experience owning systems and mentoring engineers
Nice to Have
-
Experience building datasets for multimodal foundation models (VLA, VLM or similar)
-
Robotics fundamentals (sensor synchronization, 3D transforms)
-
Experience with active learning or data-centric ML workflows
Benefits
-
Competitive compensation package
-
Various employee subsidies and perks, including public transportation and Wellpass
-
Work with a world-class team in a flat hierarchy, with direct collaboration alongside the founders and engineering team
-
Opportunity to make a real impact by working on cutting-edge robotics and AI systems
-
Fast growth potential in a rapidly evolving company and industry
-
International office environment with English as the official working language
Recruiting Process
Your recruiting partner for this role is Madhulika (she/her). You can expect a screening call and up to 4 rounds of interviews including an onsite visit to our office in Munich to meet with the team.
We hire across backgrounds, identities, and experiences, and we are committed to a workplace where everyone belongs. Discrimination has no place here.
If you need any accommodations during the recruiting process, just reach out to your recruiting partner.