ML Infrastructure Engineer
Company: OpenAI
Location: San Francisco
Posted on: March 25, 2025
|
|
Job Description:
The Runtime team builds the low level framework components to
power our ML training systems. We work on building robust,
scalable, high performance components to support our distributed
training workloads. Our priorities are to maximize the productivity
of our researchers and our hardware, with the goal of accelerating
progress towards AGI.About the RoleAs a ML Infrastructure Engineer,
you will work on improving the training throughput for our internal
training framework, while enabling researchers to experiment with
new ideas. This requires good engineering (for example designing,
implementing, and optimizing state-of-the-art AI models), writing
bug-free machine learning code (surprisingly difficult!), and
acquiring deep knowledge of the performance of supercomputers. In
all the projects this role pursues, the ultimate goal is to push
the field forward.We're looking for people who love optimizing
performance, understanding distributed systems, and who cannot
stand having bugs in their code. Since our training framework is
used for large runs with massive numbers of GPUs, performance
improvements here will have a large impact.This role is based in
San Francisco, CA. We use a hybrid work model of 3 days in the
office per week and offer relocation assistance to new employees.In
this role, you will:
#J-18808-Ljbffr
Keywords: OpenAI, San Francisco , ML Infrastructure Engineer, Engineering , San Francisco, California
Click
here to apply!
|