Everflow is a SaaS Partner Marketing platform for managing and scaling revenue from affiliates, partnerships, and marketing channels.
Founded in 2016 by industry veterans, we are based in Oakland, Montreal and Amsterdam with a distributed team across the NAM and EMEA regions.
We're a bootstrapped company (over $27M ARR) that has grown through supporting happy customers that refer business to us and create word-of-mouth excitement. We're the go-to platform for brands, agencies, and networks, with 1,000+ customers including Mutual of Omaha, ClassPass, and Tapjoy.
We are only accepting applications from candidates that currently reside in Montreal, QC
About the Role
Role Overview:
As a Site Reliability Engineer at Everflow, you will play a critical role in maintaining the reliability, scalability, and performance of our cloud infrastructure. You will collaborate closely with our engineering team to automate, monitor, and optimize our services, ensuring they meet the demands of our rapidly growing user base. You will be working primarily with Google Cloud, utilizing its full suite of services to build and maintain a resilient infrastructure.
Key Responsibilities:
On-Call Responsibilities
: Participating in a 24/7 on-call rotation to provide rapid response to production incidents.
Infrastructure Management:
Design, deploy, and manage scalable, reliable, and secure cloud infrastructure using Google Cloud Platform (GCP) and Kubernetes.
Automation:
Develop and maintain CI/CD pipelines, ensuring smooth and efficient deployment processes.
Monitoring & Incident Management:
Implement monitoring and alerting systems to detect and respond to infrastructure issues proactively. Lead the incident response process, minimizing downtime and impact.
Performance Optimization:
Analyze system performance and implement improvements to ensure low latency and high availability of services.
Security:
Collaborate with the security team to implement best practices in infrastructure security, including managing IAM roles, firewall rules, and network policies.
Collaboration:
Work closely with the engineering team to integrate DevOps practices into the development lifecycle, promoting a culture of continuous improvement.
Documentation:
Create and maintain detailed documentation for infrastructure, processes, and incident responses.
Required Qualifications:
Experience:
3 to 5 years of experience as a Site Reliability Engineer, DevOps Engineer, Software Engineer or in a similar role.
Cloud Expertise:
Strong experience with Google Cloud Platform, including services like Compute Engine, Cloud Storage, Cloud Armor, GKE, Cloud SQL, Memorystore and IAM.
Software Development:
Proficiency in programming languages (Java, Go) and automation tools (e.g., Terraform, Ansible).
Monitoring & Logging:
Experience with monitoring tools such as Prometheus, Grafana, or similar, and logging solutions like Stackdriver or ELK.
CI/CD Pipelines:
Hands-on experience with CI/CD tools (e.g., Jenkins, GitLab CI, Cloud Build) and best practices.
Problem-Solving:
Strong analytical and troubleshooting skills, with the ability to manage complex infrastructure challenges.
Team Player:
Excellent communication skills and the ability to work effectively in a collaborative environment.
Preferred Qualifications:
Performance Optimization:
Ability to track down performance bottlenecks in a distributed system and implement solutions.
Performance Testing & Benchmarking
: Expertise in load testing frameworks (JMeter, K6, Gatling) and establishing performance baselines, SLAs, and continuous performance regression testing.
Network Management:
Understanding of network protocols and experience with VPCs, load balancers, and VPNs in a cloud environment.
The pay range for this role is:
110,000 - 130,000 CAD per year(Hybrid (Montreal, Quebec, CA))
Beware of fraud agents! do not pay money to get a job
MNCJobz.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.