Software Engineer, Evals Infrastructure (Preparedness)
Company: OpenAI
Location: San Francisco
Posted on: April 1, 2025
Job Description:
Software Engineer, Evals Infrastructure (Preparedness) - OpenAI
- OpenAISoftware Engineer, Evals Infrastructure
(Preparedness)Safety Systems - San FranciscoAbout the TeamThe
Safety Systems team is responsible for various safety work to
ensure our best models can be safely deployed to the real world to
benefit society and is at the forefront of OpenAI's mission to
build and deploy safe AGI, driving our commitment to AI safety and
fostering a culture of trust and transparency.Frontier AI models
have the potential to benefit all of humanity, but also pose
increasingly severe risks. To ensure that AI promotes positive
change, the Preparedness team helps us prepare for the development
of increasingly capable frontier AI models. This team is tasked
with identifying, tracking, and preparing for catastrophic risks
related to frontier AI models.Specifically, the mission of the
Preparedness team is to:
- Closely monitor and predict the evolving capabilities of
frontier AI systems, with an eye towards misuse risks whose impact
could be catastrophic (not necessarily existential) to our
society;
- Ensure we have concrete procedures, infrastructure, and
partnerships to mitigate these risks and, more broadly, to safely
handle the development of powerful AI systems.Our team will tightly
connect capability assessment, evaluations, and internal red
teaming for frontier models, as well as overall coordination on AGI
preparedness. The team's core goal is to ensure that we have the
infrastructure needed for the safety of highly-capable AI
systems-from the models we develop in the near future to those with
AGI-level capabilities.About the RoleAs OpenAI continues to grow,
we are looking for experienced, problem-solving engineers to ensure
our systems scale. Our success depends on our ability to quickly
iterate on products while also ensuring that they are performant
and reliable. You will work in a deeply iterative, collaborative,
fast-paced environment to bring our technology to millions of users
around the world, and ensure it's delivered with safety and
reliability in mind. Successful candidates will play a crucial role
in ensuring the reliability, scalability, and performance of our
systems as we continue to expand. As a reliability expert, you will
be at the forefront of maintaining and enhancing the stability,
scalability, and performance of our rapidly evolving
infrastructure. You will work closely with cross-functional teams,
including software engineers, product managers, and data
scientists, to build and maintain resilient systems that can handle
our growing user base and workload.In this role, you will:
- Work on scaling our infrastructure to support a wide variety of
evaluations, supporting systems, and automation.
- Collaborate with development teams to make our systems more
reliable (owning Production Readiness Reviews).
- Implement and manage monitoring systems to proactively identify
issues and anomalies in our production environment.
- Develop and maintain service level objectives (SLOs) and
service level indicators (SLIs) to measure and ensure system
reliability.
- Implement fault-tolerant and resilient design patterns to
minimize service disruptions.
- Build and maintain automation tools to streamline repetitive
tasks and improve system reliability.
- Partner with engineers and researchers at OpenAI to help bring
frontier research capabilities to the world.
- Participate in an on-call rotation to respond to critical
incidents and ensure 24/7 system availability.You might thrive in
this role if you:
- Enjoy seeking out and addressing bottlenecks and areas for
performance improvement in our systems.
- Utilize Infrastructure as Code (IaC) principles to automate
infrastructure provisioning and configuration management.
- Are experienced in collaborating with cross-functional teams to
ensure that reliability and scalability are considered in the
design and development of new features and services.
- Have a track record of accelerating engineering reliability by
empowering your fellow engineers with excellent tooling and
systems.
- Help create a diverse, equitable, and inclusive culture that
makes all feel welcome while enabling radical candor and the
challenging of group think.
- Have a humble attitude, an eagerness to help your colleagues,
and a desire to do whatever it takes to make the team succeed.
- Own problems end-to-end, and are willing to pick up whatever
knowledge you're missing to get the job done.Qualifications:
- Bachelor's degree in Computer Science, Information Technology,
or a related field (or equivalent work experience).
- Have at least 7+ years of professional software engineering
experience.
- Proven experience as a reliability engineer or a similar role
in a fast-paced, rapidly scaling company.
- Strong proficiency in cloud infrastructure.
- Proficiency in programming/scripting languages.
- Experience with containerization technologies and container
orchestration platforms like Kubernetes.
- Knowledge of IaC tools such as Terraform or
CloudFormation.
- Excellent problem-solving and troubleshooting skills.
- Strong communication and collaboration skills.
- Experience with observability tools such as DataDog,
Prometheus, Grafana, Splunk, and ELK stack.
- Experience with microservices architecture and service mesh
technologies.
- Knowledge of security best practices in cloud environments.This
role is exclusively based in our San Francisco HQ. We offer
relocation assistance to new employees.About OpenAIOpenAI is an AI
research and deployment company dedicated to ensuring that
general-purpose artificial intelligence benefits all of humanity.
We push the boundaries of the capabilities of AI systems and seek
to safely deploy them to the world through our products. AI is an
extremely powerful tool that must be created with safety and human
needs at its core, and to achieve our mission, we must encompass
and value the many different perspectives, voices, and experiences
that form the full spectrum of humanity.We are an equal opportunity
employer and do not discriminate on the basis of race, religion,
national origin, gender, sexual orientation, age, veteran status,
disability, or any other legally protected status.OpenAI
Affirmative Action and Equal Employment Opportunity Policy
StatementFor US Based Candidates: Pursuant to the San Francisco
Fair Chance Ordinance, we will consider qualified applicants with
arrest and conviction records.We are committed to providing
reasonable accommodations to applicants with disabilities, and
requests can be made via this link.OpenAI Global Applicant Privacy
PolicyAt OpenAI, we believe artificial intelligence has the
potential to help people solve immense global challenges, and we
want the upside of AI to be widely shared. Join us in shaping the
future of technology.Compensation$310K + Offers Equity
#J-18808-Ljbffr
Keywords: OpenAI, San Francisco , Software Engineer, Evals Infrastructure (Preparedness), IT / Software / Systems , San Francisco, California
Didn't find what you're looking for? Search again!
Loading more jobs...