Site Reliability Engineer (remote Canada)

Toronto, ON, Canada

Apply Now

Job Description

Why We Need You:
We are looking for an experienced Site Reliability Engineer to join Uptime.com and help us build reliable, robust software solutions for our customers. As a Site Reliability Engineer, you will be responsible for monitoring system performance, troubleshooting technical issues, deploying code changes, and collaborating with other teams to ensure the best possible customer experience. The ideal candidate should have extensive experience in cloud infrastructure, distributed systems engineering, scripting and automation tools such as Docker containers and Kubernetes clusters. Additionally, you should possess the skills needed to manage service outages and ensure system availability by writing scalable software solutions. What You Will Do:

Monitor system health metrics to proactively identify potential bottlenecks or errors
Develop strategies for resolving performance issues and identify areas of improvement
Manage monitoring tools like Grafana and Prometheus including deploying and optimizing their usage
Deploy releases of applications and services in collaboration with developers
Troubleshoot production outages and implement fault tolerance solutions
Maintain documentation related to system operation procedures
Document game-day scenarios and test these scenarios
Develop and support automation that allows for continuous testing of software created by the team
Design and assist in the setup and maintenance of application monitoring and alerting
Assist in designing and deploying HA/DR architecture for mission critical workloads
Collaborate with other teams to ensure optimal performance of system and dependent resources
Participate in on-call duty rotation

Requirements What You Will Need:

Bachelor\'s degree in Computer Science or relevant field preferred
5+ years of experience in SRE/DevOps roles
Good communicator and able to clearly articulate complex issues and technologies.
Expertise in Linux server administration and scripting languages (Python)
Knowledge of containerization technologies like Docker & Kubernetes
Proficient in a modern scripting language like GO or Python
Deep understanding of modern microservices based architectures and operations
Experience in defensive coding practices and patterns for high-availability.
Familiarity with configuration management tools
Excellent problem solving skills & strong collaboration abilities
Be comfortable working in a fast-paced agile environment. Requirements change quickly and our team needs to constantly adapt to moving targets.

Benefits How we will support your growth and success:

Partner with executives, leadership and cross-functional organization including engineering, marketing and business operations.
Professional development opportunities to further skills and knowledge
Discover the exciting world of monitoring, observability, and SRE while becoming an advocate and drive innovation in the industry.
A supportive team of passionate and dedicated individuals all focused on building the best monitoring service in the world.

Health Care Plan (Medical, Dental & Vision)
Paid Time Off (Vacation, Sick & Public Holidays)
Family Leave (Maternity, Paternity)
Training & Development
Work From Home

Beware of fraud agents! do not pay money to get a job

MNCJobz.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.

Related Jobs

Sr Cloud Engineer / Sr Site Reliability Engineer Remote in Canada

UnitedHealth Group

Richmond, BC

Apply Now
Senior Site Reliability Engineer

Shakepay

Canada

Apply Now

Senior Cloud Engineer/Site Reliability Engineer

Change Healthcare

British Columbia

Apply Now
Site Reliability Engineer

Royal Bank of Canada

Toronto, ON

Apply Now

Job Detail

Job Id

JD2190485
Industry

Not mentioned
Total Positions

1
Job Type:

Full Time
Salary:

Not mentioned
Employment Status

Permanent
Job Location

Toronto, ON, Canada
Education

Not mentioned

Jobs by Function

Popular Job Skills

Popular Industries

Popular Cities

Jobseekers

Employers

Site Reliability Engineer (remote Canada)

Job Description

Related Jobs

Sr Cloud Engineer / Sr Site Reliability Engineer Remote in Canada

Senior Site Reliability Engineer

Senior Cloud Engineer/Site Reliability Engineer

Site Reliability Engineer