Staff Site Reliability Engineer, PaaS

Company:  Algolia
Location: Paris
Closing Date: 25/11/2024
Salary: £80 - £100 Per Annum
Type: Temporary
Job Requirements / Description
Algolia is set to enable every company to create world-class Search and Discovery experiences with an API-first approach. Performance and Scalability is at the heart of our mission: we power 1.5 trillion searches a year, for 10K+ customers all over the world. If you're a problem solver, able to think outside the box and eager to nurture others and learn from them, then this is your challenge! The Team The Platform as a Service (PaaS) team is dedicated to empowering development teams by creating toolchains, guidelines, and standards. Our focus is on enabling seamless automation and CI/CD, comprehensive observability, and unwavering reliability in a secured cloud-native environment. The Opportunity The Staff Engineer position within the Platform As a Service team offers a compelling opportunity for an adept professional with a rich background in architecting, constructing, and managing scalable infrastructures. This role specifically concentrates on three key areas: CI/CD, Observability, and application hosting. As a senior member of the Platform As a Service team, you will wield significant influence over Algolia's Search Products. Your responsibilities will revolve around crafting and executing systems pivotal to ensuring reliability, scalability, and cost optimisation. You will be instrumental in architecting robust CI/CD pipelines, establishing comprehensive observability frameworks, and managing hosting solutions focused on API Management and micro-services management. Moreover, as an expert within the team, you will actively participate in mentoring and guiding fellow team members, fostering a culture of collaboration and excellence. In addition, this role entails actively engaging in cross-team collaboration, spearheading projects alongside SREs and SWEs. Your role will consist of: Design and deploy a cloud-native API Management to boost platform scalability, security, and reliability, while expediting new feature setup for swift and seamless onboarding of development teams. Spearhead the design and implementation of a robust and scalable CI/CD toolchain, serving as a centralised build factory to streamline development processes and ensure consistent quality across all services hosted on the product platform. Lead the development and deployment of comprehensive observability standards and automation solutions, empowering teams with actionable insights and enabling proactive resolution of issues, enhancing overall system reliability and performance. Drive the evolution and maintenance of a Kubernetes-based architecture, optimising resource utilisation, enhancing fault tolerance, and ensuring the platform's ability to meet evolving demands efficiently and effectively. Provide guidance and mentorship to other SRE team members, helping them to develop their skills and knowledge of best practices in site reliability engineering. Establish and enforce engineering processes and best practices that ensure high-quality, reliable, and scalable systems, and work with other teams to promote the adoption of these processes and practices across the organization. Collaborate with senior leadership to shape the vision and direction of the company (cloud) infrastructures, and help drive the development of SRE-specific strategies and initiatives that align with business objectives. Build and maintain strong relationships with stakeholders across the organization, and represent the SRE organization in cross-functional meetings and discussions. You might be a fit if you have: Strong knowledge of programming languages such as Golang and Python; familiarity with Ruby is a plus. Experience designing and building API Management and Kubernetes-based architecture. Experience building and operating distributed systems at scale. Experience on CI/CD setup and architecture; strong knowledge of Github Actions, Circle-CI, or alternatives is expected. Experience designing new applications with reliability, operability, and availability in mind. Experience with Public Cloud Providers such as GCP, AWS, or Microsoft Azure, and administration of Kubernetes. Excellent communication and organisation skills. We're looking for someone who can live our values: GRIT - Problem-solving and perseverance capability in an ever-changing and growing environment. TRUST - Willingness to trust our co-workers and to take ownership. CANDOR - Ability to receive and give constructive feedback. CARE - Genuine care about other team members, our clients, and the decisions we make in the company. HUMILITY - Aptitude for learning from others, putting ego aside. #LI-Remote #J-18808-Ljbffr
Apply Now
Share this job
Algolia
  • Similar Jobs

  • Site Reliability Engineer

    Paris
    View Job
  • Site Reliability Engineer

    Paris
    View Job
  • Site Reliability Engineer

    Paris
    View Job
  • Site Reliability Engineer

    Paris
    View Job
  • Site Reliability Engineer

    Paris
    View Job
An unhandled exception has occurred. See browser dev tools for details. Reload 🗙