Senior Data Engineer (F/M/D)

Animore
München

Direkt bewerben

Details zur Stelle

Vollzeit

Qualifikationen

PyTorch
Spark
NumPy
Englisch
Google Cloud Platform
AWS
Python

Vollständige Stellenbeschreibung

The Opportunity

We’re looking for a Senior Data Engineer to architect and scale the data backbone powering next-generation AI models in robotics and real-world environments.

This role sits at the intersection of distributed systems, multimodal data processing, and applied machine learning, with a strong focus on building high-quality datasets for robotic foundation models. You will ensure that data pipelines, infrastructure, and data strategy directly translate into measurable improvements in model performance.

Your Responsibilities

Drive the model–data loop by connecting application requirements with data collection, and translating model failures into data-driven improvements through collection, curation, and augmentation
Build and scale distributed data pipelines (Ray/Anyscale or similar) for TB-scale video, sensor, and robotics datasets
Design multimodal data schemas aligning video, actions, and high-frequency sensor streams
Develop Python tooling for data quality, including cleaning, anomaly detection, and dataset versioning
Own dataset quality and coverage, including annotation workflows, data diversity, and storage trade-offs
Lead a small team and coordinate with data providers and annotation vendors
Oversee real-world data collection, including technical setup, compliance, and secure data handling

Technologies

Python (advanced, production-grade)
Ray / Anyscale or Apache Spark
AWS / GCP for large-scale data and GPU training pipelines
Video and sensor data formats (H.264/H.265, ROS bags, MCAP)
PyTorch, NumPy
DVC, LakeFS or similar data versioning tools
Distributed data processing and storage systems

Requirements

Must Have

- 5+ years in Data/ML Engineering, including 2+ years in a senior or lead role
- Experience with large-scale real-world data (robotics, autonomous systems, or video AI)
- Strong experience with Ray/Anyscale or Spark for distributed pipelines
- Advanced Python (performance, concurrency, ML stack like NumPy/PyTorch)
- Experience working with video and sensor data formats (e.g., H.264/H.265, ROS bags, MCAP)
- Experience building scalable data pipelines for GPU-based training workloads (AWS/GCP)
- Experience with data versioning tools such as DVC or LakeFS
- Proven experience owning systems and mentoring engineers

Nice to Have

- Experience building datasets for multimodal foundation models (VLA, VLM or similar)
- Robotics fundamentals (sensor synchronization, 3D transforms)
- Experience with active learning or data-centric ML workflows

Benefits

Competitive compensation package
Various employee subsidies and perks, including public transportation and Wellpass
Work with a world-class team in a flat hierarchy, with direct collaboration alongside the founders and engineering team
Opportunity to make a real impact by working on cutting-edge robotics and AI systems
Fast growth potential in a rapidly evolving company and industry
International office environment with English as the official working language

Recruiting Process

Your recruiting partner for this role is Madhulika (she/her). You can expect a screening call and up to 4 rounds of interviews including an onsite visit to our office in Munich to meet with the team.

We hire across backgrounds, identities, and experiences, and we are committed to a workplace where everyone belongs. Discrimination has no place here.

If you need any accommodations during the recruiting process, just reach out to your recruiting partner.

Direkt bewerben

Tools für Jobsuchende

Arbeitgebertools

Durchsuchen

Bleiben Sie in Kontakt