Lead Site Reliability Engineer, Observability - Remote
Company: Reporter Newspapers
Location: San Francisco
Posted on: March 29, 2025
Job Description:
The Meraki cloud supports millions of customer devices from 8
data centers around the world. Meraki's customer base has grown by
a factor of 2-3 every year, serving billions of HTTP requests per
day globally. Our customers depend on our products to run their
critical infrastructure of network switches, security appliances,
wireless APs and security cameras.As SREs at Meraki, we are
responsible for building and growing the cloud that supports these
customers and their networks. As a Lead Site Reliability Engineer
on the Observability team you will lead the design, development and
operation of large-scale, secure observability systems that make
sure our services stay online and performant. We're a team of
passionate software engineers that value quality and customer
experience. Our team is based in the US and EMEA, and we embrace
hybrid and remote work.Examples of projects our team works on:
- Design, deploy and scale our Prometheus architecture to handle
100+ million active series and beyond.
- Deploy and operate large, high-performance ElasticSearch
clusters holding 2000+TB of data.
- Deploy and grow high-throughput data pipelines built on Kafka,
handling hundreds of thousands of events per second.
- Design and build an alerting system that allows engineering
teams to construct alerts from multiple data sources and alerting
workflows.
- Write libraries and APIs that give engineers self-service
access to our monitoring, logging, and other observability
systems.
- Use Terraform to deploy public and private cloud
infrastructure.You are an ideal candidate if you:
- Have 5+ years experience designing, deploying and operating mid
to large size distributed systems on VMs or bare metal machines
running Linux (we run Debian and Ubuntu).
- Have 2+ years experience developing with languages like Ruby,
Python, Go, Scala, or Bash.
- Are excited by the challenge of solving difficult problems in
large distributed systems that deal with huge amounts of data.
- Want to work on a highly autonomous team that cares deeply
about quality and customer experience.
- Are curious, learn fast and feel comfortable diving into
unfamiliar code and systems to solve problems.
- Understand the value of observability and can work with other
teams to help them better monitor their services.
- Are willing to be part of a production on-call rotation.
- Have direct experience with the following technologies (or
similar): Elasticsearch Logstash Kibana (ELK) stack, Kafka,
Prometheus/Thanos/Cortex, Graphite, Ansible, Terraform,
Consul.
- Have strong experience in building out solutions based on
Software engineering best practices.Keywords: Observability,
Monitoring, SRE, Site Reliability Engineering, DevOps,
ElasticSearch, Logstash, Kibana, ELK, Grafana, Graphite,
Prometheus, Kafka, Snowflake, Ansible, Ruby, Terraform, Consul.The
successful applicant may be performing work in FedRAMP High or IL-5
environments, and therefore, must be a U.S. Person (i.e. U.S.
citizen, U.S. national). This position may also perform work that
the U.S. government has specified can only be performed by a U.S.
citizen on U.S. soil.At Cisco Meraki, we're challenging the status
quo with the power of diversity, inclusion, and collaboration. When
we connect different perspectives, we can imagine new
possibilities, inspire innovation, and release the full potential
of our people. We're building an employee experience that includes
appreciation, belonging, growth, and purpose for everyone.Cisco is
an Affirmative Action and Equal Opportunity Employer and all
qualified applicants will receive consideration for employment
without regard to race, color, religion, gender, sexual
orientation, national origin, genetic information, age, disability,
veteran status, or any other legally protected basis. Cisco will
consider for employment, on a case by case basis, qualified
applicants with arrest and conviction records.
#J-18808-Ljbffr
Keywords: Reporter Newspapers, San Francisco , Lead Site Reliability Engineer, Observability - Remote, Engineering , San Francisco, California
Didn't find what you're looking for? Search again!
Loading more jobs...