Manager, Site Reliability Engineering

Company:  Algolia
Location: Paris
Closing Date: 14/11/2024
Salary: £80 - £100 Per Annum
Type: Temporary
Job Requirements / Description
Algolia is set to enable every company to create world-class Search and Discovery experiences with an API-first approach. Performance and Scalability is at the heart of our mission: we power trillion searches a year, for 10K+ customers all over the world. As Manager Site Reliability Engineer in the Production Engineering team of Algolia, you will lead the Fleet team of Site Reliability Engineers responsible for the provisioning and the global reliability of the Search Products at scale. Your team will focus on creating pragmatic solutions to optimize the Search Products availability and costs at scale, depending on the needs of the customer, the Product teams, and the different engineering teams that deliver a unique Search Experience to our customers. You will be supported by experienced Individual Contributors to: Deliver a migration orchestrator that has a huge impact on the product scalability, reliability and cost Operate the Search Products Run and improve our homemade Edge Load balancer Build, run and improve a backup/restore system to ensure the respect of our SLAs YOUR ROLE WILL CONSIST OF: Collaborating with senior leadership to define the overall technical direction and strategy for the organization, and ensure that the SRE team's goals and initiatives are aligned with this strategy. As well as building and maintaining strong relationships with stakeholders across the organization, as you represent the SRE organization in cross-functional meetings. You will also stay close to product and design teams to ensure that the user experience is always top of mind. You are expected to provide leadership, guidance and mentorship to your team members, helping them to develop their technical skills and knowledge of best practices in site reliability engineering. You will continuously evaluate and improve the performance of the SRE team, and you will identify and implement initiatives to drive operational excellence and improve overall service reliability. Establishing and enforcing engineering processes and best practices that ensure high-quality, reliable, and scalable systems, as well as working with other teams to promote the adoption of these processes and practices across the organization. You will be responsible for defining and maintaining service level agreements (SLAs) and key performance indicators (KPIs) for your team's services, and you will work with other teams to ensure that these SLAs and KPIs are being met. As well as leading cross-functional efforts to resolve complex technical issues and mitigate operational risks across multiple teams and domains. Along with your team you will help design and implement monitoring, alerting, and metrics systems to ensure the availability, performance, and reliability of your team's services, and you continuously refine and improve these systems. Collaborating with other technical teams to identify opportunities to automate processes, as well as designing and implementing automated tools and systems to support these processes. As manager, you will also manage the budget for your team, ensuring that resources are being used efficiently. Finally, you will be responsible for documenting your team's projects and processes, and ensuring that this documentation is up-to-date and accessible to all stakeholders. YOU MIGHT BE A FIT IF: 4+ years of engineering management experience You are fluent in Agile methodology and can lead a project from the idea to Production You are an excellent communicator, both towards Product managers, Technical Program Managers, and Individual Contributors to your team You are comfortable managing a large team regrouping all seniority levels, and accompanying Individual Contributors in their growth and development You know how to deploy an application from laptop to production, are able to fully automate it, and you are comfortable with Production requirements (Observability, Alerting, ...) You are knowledgeable in DevOps principles and CI/CD pipelines You are knowledgeable in Configuration Management and Infrastructure as Code such as Chef and Terraform You are knowledgeable in at least one programming language (Python, Golang, Ruby.) and are familiar with software craftsmanship Full professional English proficiency Ability to make decisions and take ownership for them WE'RE LOOKING FOR SOMEONE WHO CAN LIVE OUR VALUES: GRIT - Problem-solving and perseverance capability in an ever-changing and growing environment. TRUST - Willingness to trust our co-workers and to take ownership. CANDOR - Ability to receive and give constructive feedback. CARE - Genuine care about other team members, our clients and the decisions we make in the company. HUMILITY - Aptitude for learning from others, putting ego aside. #LI-Hybrid REMOTE STRATEGY: Algolia’s flexible workplace model is designed to empower all Algolians to fulfill our mission to power search and discovery with ease. We place an emphasis on an individual’s impact, contribution, and output, over their physical location. Algolia is a high-trust environment and our team members have the autonomy to choose where they want to work and when. We know community comes in many forms and strive to create opportunities for intentional in-person connection in our offices and virtually for our remote colleagues around the world. We have a global presence with physical offices in San Francisco, NYC, Paris, London, Sydney and Bucharest. #J-18808-Ljbffr
Apply Now
Share this job
Algolia
  • Similar Jobs

  • Site Reliability Engineering F/H

    Vélizy-Villacoublay
    View Job
  • Consultant Site Reliability Engineering HF

    Neuilly-sur-Seine
    View Job
  • Site Reliability Engineering F/H

    Vélizy-Villacoublay
    View Job
  • Site Reliability Engineering Intern - France, Paris

    Paris
    View Job
  • Site Reliability Engineer

    Paris
    View Job
An unhandled exception has occurred. See browser dev tools for details. Reload 🗙