Software Engineer, AI Infrastructure (Training + Inference)
Company: WaveForms AI
Location: San Francisco
Posted on: March 19, 2025
Job Description:
Job title: Software Engineer, AI Infrastructure (Training +
Inference) / Member of Technical StaffWho We Are
WaveForms AI is an Audio Large Language Models (LLMs) company
building the future of audio intelligence through advanced research
and products. Our models will transform human-AI interactions
making them more natural, engaging and immersive.Role overview: The
Software Engineer, AI Infrastructure (Training + Inference) will be
responsible for designing, building, and optimizing the
infrastructure that powers our large scale training and real-time
inference pipelines. This role combines expertise in distributed
computing, system reliability, and performance optimization. The
candidate will collaborate with researchers with a focus on
building scalable systems to support novel multimodal training and
maintaining uptime to deliver consistent results for real-time
applications.Key Responsibilities
- Infrastructure Development: Design and implement infrastructure
to support large-scale AI training and real-time inference with a
focus on multimodal inputs.
- Distributed Computing: Build and maintain distributed systems
to ensure scalability, efficient resource allocation, and high
throughput.
- Training Stability: Monitor and enhance the stability of
training workflows by addressing bottlenecks, failures, and
inefficiencies in large-scale AI pipelines.
- Real-time Inference Optimization: Develop and optimize
real-time inference systems to deliver low-latency, high-throughput
results across diverse applications.
- Uptime & Reliability: Implement tools and processes to maintain
high uptime and ensure infrastructure reliability during both
training and inference phases.
- Performance Tuning: Identify and resolve performance
bottlenecks, improving overall system throughput and response
times.
- Collaboration: Work closely with research and engineering teams
to integrate infrastructure with AI workflows, ensuring seamless
deployment and operation.Required Skills & Qualifications
- Distributed Systems Expertise: Proven experience in designing
and managing distributed systems for large-scale AI training and
inference.
- Infrastructure for AI: Strong background in building and
optimizing infrastructure for real-time AI systems, with a focus on
multimodal data (audio + text).
- Performance Optimization: Expertise in optimizing resource
utilization, improving system throughput, and reducing latency in
both training and inference.
- Training Stability: Experience in troubleshooting and
stabilizing AI training pipelines for high reliability and
efficiency.
- Technical Proficiency: Strong programming skills (Python
preferred), proficiency with PyTorch, and familiarity with cloud
platforms (AWS, GCP, Azure).Minimum Experience
- 4-5 years of relevant professional experience is required
#J-18808-Ljbffr
Keywords: WaveForms AI, San Francisco , Software Engineer, AI Infrastructure (Training + Inference), IT / Software / Systems , San Francisco, California
Didn't find what you're looking for? Search again!
Loading more jobs...