Manager, Site Reliability Engineering

Company:  Lightspeed
Location: Toronto
Closing Date: 02/08/2024
Hours: Full Time
Type: Permanent
Job Requirements / Description
We are looking for a Manager, Site Reliability Engineering  to lead the team of cloud platform engineers building and supporting infrastructure backing critical Lightspeed services. The platform covers the full cycle of software delivery, from CI/CD pipelines to high-availability scalable production environments. Role:  Highly autonomous role responsible for the team’s overall direction and execution Own the full scope of production frameworks, tools and infrastructure for delivering and running services in production environments on multiple clouds (GCP, AWS) Define team’s vision, rationalize and prioritize projects with the emphasis on improving developer experience, production stability and scalability. Build the team’s roadmap, establish the processes to execute on it, and keep the team on track. Hands on and highly technical. Set the technical direction for the team, guide the process of selection and evolution of technologies and tools. Empower and grow the team of 5-10 members Work closely with multiple development teams to understand their pain points and how to unlock more value and productivity Lead the team to design, build and maintain robust infrastructure built upon GCP and AWS, leveraging cloud native technologies such as Terraform, GKE, Cloud SQL, BigQuery, etc. Improve, simplify and manage CI/CD pipelines for efficient deployment and release using a number of technologies (GitLab, Gihub, Helm, Terraform, CircleCI, Jenkins etc.). Participate in the incident management process and conduct post-mortem analysis to prevent future outages. Manage infrastructure change through infrastructure as code (IaC) Be part of our on-call rotation. And a little bit of....  Take initiative to identify broader opportunities to improve the platform and processes across the company Contribute to the team’s objectives hands-on as needed. What will make you successful:  The team has a clear vision and the roadmap that is aligned to it The team delivers on the roadmap milestones KPI for developer productivity is established, tracked, and the team drives improvements in productivity  Stability KPI: the number of incidents related to production outages  Scalability KPI: the platform handles traffic growth and spikes without going down  Costs KPI: the platform is optimized for cost savings, high utilization. Costs are monitored and spikes are alerted on. Experience :  A Bachelor’s degree in Computer Science, Engineering, or equivalent practical expertise serves as a foundational knowledge base. Demonstrated proficiency in effectively overseeing production environments. Extensive hands-on experience with GCP and AWS Proven expertise in orchestrating and managing infrastructure through code, streamlining operations and promoting automation. Built and managed CI/CD pipelines to streamline software development processes. Led small to medium-sized teams of platform infrastructure engineers Ability to collaborate closely with stakeholders across different teams and disciplines Skills you will bring to the team: Self-starter, able to set the direction for the team Dealing with ambiguity, able to structure complex problem space and turn it into executable roadmap Strong ownership, taking full responsibility for the team’s scope and successful execution Strong execution, ability to structure the processes to get things done Quick learner, able to dig deep and understand new technologies and frameworks  Ability to get the team excited about the vision, empower and coach Solid collaboration and communication skills  Deep understanding and hands-on experience building scalable software delivery infrastructure on GCP and AWS What’s in it for you:  Join a growing team and help us move to the next level Amazing benefits & perks, including equity for all Lightspeeders Constant development of both your skill-set and business acumen with limitless growth opportunities Lots of autonomy, flexible work culture Innovation time to explore and learn at work Shaping the company by joining cultural & technical committees Opportunity to join a fast-paced, high-growth company Opportunity to learn, expand your skill set, forge wonderful relationships and make your mark within the diverse and inclusive Lightspeed family, a true Canadian tech success story …. And enjoy a range of benefits that will keep you happy, healthy and (not) hungry. Lightspeed equity scheme (we are all owners). Flexible paid time off and remote work policies. Health insurance. Contributions to your pension plan - RRSP. Health and wellness benefit of $500 per year. Paid leave and assistance for new parents. Mental health online platform and counselling & coaching services. Training opportunities to grow your skills and career Volunteer day. Fully stacked kitchen (hot and cold beverages, meals served)  Happy hours to build your relationships with colleagues after work  #LI-PR1
Apply Now
Share this job
Lightspeed
  • Similar Jobs

  • Manager, Site Reliability Engineering

    Old Toronto
    View Job
  • Staff Software Engineer, Site Reliability Engineering

    Old Toronto
    View Job
  • Staff Software Engineer, Site Reliability Engineering

    Toronto
    View Job
  • Site Reliability Engineer

    Toronto
    View Job
  • Site Reliability Engineer

    Old Toronto
    View Job
An unhandled exception has occurred. See browser dev tools for details. Reload 🗙