Production Management | Level 3 (cad) Rsr Msp

Montreal, QC, Canada

https://ca.mncjobz.com/company/lancesoft

Apply Now

Job Description

Job Title: Site Reliability Engineer
Experience Level: Level 3 (senior): 5-7 years
Location: Montreal (Day 1 onboarding onsite / in office presence 3x week)
Job Responsibilities
Client is seeking a Site Reliability Engineer to drive the reliability engineering, operational support and customer consultation services for some key products. Team is a part of the Application Infrastructure organization and is responsible for shaping the SDLC within client by implementing the tools, systems, and processes used by 17,000+ developers in the Firm for software development and deployment.
Job Responsibilities

Building and maintaining knowledge front to back of development environment
Maximizing the availability and performance of supported systems through optimized and automated plant management, ongoing problem management, and architecture reviews with dev-side peers
Reduction of the cost of support (hours of effort) through the elimination of operational issues, optimization and automation of tasks, development of operational tools and driving client self-service to minimize constraints
Identification and prioritization of technical debt that is impacting client (i.e. software developers) productivity, system reliability or the efficiency of the Ops team
Collaboration with other SREs in Application Infrastructure to share solutions
Complex troubleshooting of front to back development environment issues
Maximize Ops team product knowledge and support capabilities to minimize the escalation rate to the departments feature engineers/developers
Consulting with clients (the Firms development community) to maximize productivity, including troubleshooting their issues with using MSDEs solutions
Experimentation with new tools and techniques
Being operationally responsive, including sharing on-call rotation with the rest of the global team (with a time-off in lieu system)

Required Qualifications / Skills

Strong Linux troubleshooting skills
Task automation experience in any programming language, preferably Python
Practical experience of implementing monitoring / observability solutions using Prometheus and Grafana
Experience with using version control (Bitbucket, Github), issue tracking (Jira), continuous integration (Jenkins, Azure DevOps, Github Actions), automated testing, or deployment automation
Excellent communication skills to work with peers / third party vendors
Confident collaboration skills

Desired Skills

Experience with site reliability engineering practices, like service level objectives (SLOs), error budgets, blameless postmortems, toil reduction
Experience with Docker / Kubernetes

*//
EEO Employer
Minorities/ Females/ Disabled/ Veterans/ Gender Identity/ Sexual Orientation
//*

Beware of fraud agents! do not pay money to get a job

MNCJobz.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.