Hearing is essential to how a robot understands and responds to the world. At NEURA Robotics, audio is a first-class modality: spoken instructions, contact sounds, and ambient cues all inform autonomous action. As Audio AI Engineer, you own the real-time audio pipeline on the robot, the models that turn sound into meaning, and the voice interface that lets people speak to our humanoids the way they would to another person.
The role can emphasize conversational AI, audio ML modeling, or embedded audio DSP. We expect depth in at least one area and breadth across the others; you will lead where strongest and collaborate with AI and hardware teams on the rest.
Voice Interaction Stack: You build and own the edge-to-cloud hybrid automatic speech recognition, text-to-speech, wake-word, voice activity detection, and natural language understanding pipelines that connect the human voice to our robot's cognitive core, optimizing for low latency, multi-speaker scenarios, and noisy real-world environments.
Audio Encoder Research: You design, train, and integrate audio encoders that feed our foundation models, and develop the ambient and contact-acoustic event recognition that gives our robot situational awareness.
Real-Time Audio Pipeline: You architect the shared audio substrate from microphones to model input - acquisition, denoising, beamforming, source separation, and tight synchronization with vision and proprioception streams - and optimize it for our on-robot compute and latency budgets.
Models, Data & Evaluation: You evaluate, fine-tune, and deploy state-of-the-art models across speech and general audio, drive data collection from real deployments, and build the evaluation infrastructure that turns recordings into measurable model improvements.
Sensor Strategy & Integration: You help select and qualify audio hardware (mic arrays, contact and tactile microphones, ADC frontends) with the hardware team, define calibration and mounting requirements, and ensure clean integration with the AI, hardware, and agentic stacks.
An excellent Master's or PhD in Computer Science, Electrical Engineering, Computational Linguistics, or a related field.
3+ years of professional experience in audio-related AI engineering
A proven track record: your projects show measurable impact, whether through publications, shipped systems, or both.
Depth in at least one of the following, with curiosity and breadth across the others:
Conversational AI: ASR, TTS, NLP/NLU, dialogue systems, real-time speech systems with LLMs.
Audio ML modeling: audio representation learning, multimodal / VLA foundation models with an audio branch, generative audio.
Embedded audio DSP: real-time signal processing, mic-array processing, low-level audio I/O, quantized inference for on-device deployment.
Strong programming skills in Python; solid C/C++ a plus for real-time and on-device work.
Familiarity with ROS or robotics middleware is a plus.
Experience with agentic frameworks and LLM tool-use is a plus.
Experience with audio simulation, room acoustics, or spatial audio is a plus.
Hands-on experience setting up audio recording equipment for ML data collection (microphone selection, placement, calibration) is nice to have.
Team spirit, initiative, and the ability and willingness to explore new paths.
Excellent English skills; German is optional but welcome.