Software Engineer, Data Infrastructure
Company: OpenAI
Location: San Francisco
Posted on: March 27, 2025
Job Description:
The Research Platform Analytics team designs, builds, and
operates the critical foundational data and analytics
infrastructure that enables research at OpenAI.Our goal is one, and
one only: accelerate the progress of research towards AGI. We do
this by owning a variety of observability and analytics systems
aimed at providing quality signals about our research, and own the
entire lifecycle of it, starting with data production from training
workloads, to ingestion, post-processing and end-user analytics
products. All of this at large scale.About the RoleAs we scale up
with more researchers and engineers joining OpenAI, we seek a
pragmatic and passionate engineer with a strong focus on the
experience for both engineers and scientists that work in our large
data sets.Our work involves building a generic data processing
platform that enables researchers to store, query, and process
petabyte-scale datasets efficiently. This includes developing and
maintaining large-scale stream and batch data pipelines, ensuring
our infrastructure scales to support ML workloads, and making
trade-offs to deliver impact quickly. We work across distributed
data systems, infrastructure, and observability, ensuring
reliability while moving fast.You will find yourself at home if you
are comfortable with work such as scaling Kubernetes services,
debugging Kafka consumer lag, diagnosing distributed systems
failures, and developing new end-to-end data processing
pipelines-from raw data capture to analytics using Presto, Trino,
or Flink. A portion of this role involves hands-on infrastructure
work, including deploying and troubleshooting core services.This
role is based in San Francisco, CA. We use a hybrid work model of 3
days in the office per week and offer relocation assistance to new
employees.In this role, you will:
- Build and maintain large-scale stream and batch processing
pipelines (Kafka, Spark, Flink, Trino/Presto).
- Develop a general-purpose data processing platform for handling
massive datasets.
- Scale applications for ML research, ensuring smooth operation
as workloads grow.
- Ensure the security, integrity, and compliance of data
according to industry and company standards.
- Ensure our analytics and data platforms can scale reliably to
the next several orders of magnitude.
- Accelerate company productivity by empowering your fellow
engineers, researchers, and teammates with excellent data tooling
and systems, providing a best-in-case experience.
- Bring new features and capabilities to the world by partnering
with product engineers, trust & safety and other teams to build the
technical foundations.
- Like all other teams, we are responsible for the reliability of
the systems we build. This includes an on-call rotation to respond
to critical incidents as needed.You might thrive in this role if
you have:
- Proficient in Python and backend development, with experience
working in large codebases (monorepos).
- Experience building and operating large-scale stream and batch
processing pipelines (Kafka, Spark, Flink, Presto/Trino).
- Hands-on experience with Kubernetes, Terraform, and
deploying/troubleshooting production systems.
- Worked on access control, provenance, auditing, and large-scale
data movement.
- Passion for building systems that provide key insights,
especially in ML training workflows.
- Comfortable in a fast-moving environment, making trade-offs to
deliver impact quickly.
- Understanding of data transformations in ML training and
inference workflows is a plus.About OpenAIOpenAI is an AI research
and deployment company dedicated to ensuring that general-purpose
artificial intelligence benefits all of humanity. We push the
boundaries of the capabilities of AI systems and seek to safely
deploy them to the world through our products. AI is an extremely
powerful tool that must be created with safety and human needs at
its core, and to achieve our mission, we must encompass and value
the many different perspectives, voices, and experiences that form
the full spectrum of humanity.We are an equal opportunity employer
and do not discriminate on the basis of race, religion, national
origin, gender, sexual orientation, age, veteran status, disability
or any other legally protected status.For US Based Candidates:
Pursuant to the San Francisco Fair Chance Ordinance, we will
consider qualified applicants with arrest and conviction records.We
are committed to providing reasonable accommodations to applicants
with disabilities, and requests can be made via this link.At
OpenAI, we believe artificial intelligence has the potential to
help people solve immense global challenges, and we want the upside
of AI to be widely shared. Join us in shaping the future of
technology.
#J-18808-Ljbffr
Keywords: OpenAI, San Francisco , Software Engineer, Data Infrastructure, IT / Software / Systems , San Francisco, California
Didn't find what you're looking for? Search again!
Loading more jobs...