Urgent Hiring For Sre/devops Engineer In Toronto, Ontario

Toronto, ON, Canada

https://ca.mncjobz.com/company/artech-information-systems

Apply Now

Job Description

Location:
Toronto, Ontario (4 days onsite)
Duration:
06 Months
:
Our client is seeking to hire a Senior Site Reliability Engineer for its Application Maintenance and Transformation, Data Services, and Integration team. As a Senior Site Reliability Engineer, you will bring an engineering mindset of ambition, curiosity, and outcome focus to ensure the performance and reliability of our systems. This role calls for a dynamic individual who excels in a collaborative environment, interacting with cross-functional teams to establish best practices for observability, monitoring, logging, alerting, and automation.
Required Skills & Qualifications

Bachelor's degree in Computer Science, Electrical or Electronics Engineering or related field, or equivalent experience.
Minimum of 3 years of IT experience in software development and/or maintenance or SRE or DevOps Engineering experience.
At least 1 year of experience building Java Spring Boot applications and REST API development.
Experience with relational databases such as MS-SQL Server, MySQL, MariaDB, and SingleStore or in-memory distributed databases.
Proficiency in containerization platforms such as Docker and container orchestration tools like Kubernetes (Azure Kubernetes or OpenShift Kubernetes Service preferred).
Solid Git skills with experience working on popular CI tools such as Jenkins or UCD.
Experience working on Windows and Linux-based infrastructure.
At least 1 year of experience developing cloud-native applications using Java or Python.
Experience writing SQL queries and fine-tuning or optimization skills.
Experience using centralized logging solutions (Splunk, Elk preferred) and active monitoring systems (Dynatrace, etc.).
Experience deploying and operating cloud-native applications in a Private (OpenShift) or public cloud (Azure/AWS preferred).
Proactive communication skills around the status of projects/issues in production.
Must be a self-starter, motivated, resourceful, and driven to work with cross-functional teams in large enterprises with complex organizational structures to meet business timelines on delivery.
Financial Services domain knowledge, preferably in Capital Markets and Wealth Management.

Preferred Skills & Qualifications

Experience implementing dashboards to help teams visualize logs, instrumentation, and other data to ensure optimal performance of platform services, infrastructure, and deployed applications (Grafana preferred).
Exposure to Datawarehouses like Informatica, Snowflake, or Databricks and Business Intelligence tools like SAP BO or similar.
Experience creating runbooks, processes, and test plans around reliability, performance, etc., of infrastructure and applications.
Exposure to tools such as PagerDuty, Postman, ServiceNow, SonarQube, NexusIQ, and vault tools.
Exposure to event brokers like Kafka or IBM-MQ, Mainframe tools, and environments.
Exposure to Industry Disaster recovery test exercises.

Day-to-Day Responsibilities

Set vision for SRE product base (monitoring, alerting, self-healing, reliability testing).
Lead cross-functional collaborations to define and implement best practices for monitoring, logging, and incident response, driving a proactive stance on system health.
Function as portfolio SME (Subject Matter Expert) - understand and document common components, core functionalities, and infrastructure of supported applications.
Actively participate in deploying software applications, automation tools, and IT infrastructure.
Work closely with development teams to understand code changes and their impact on the production environment, ensuring that new releases meet our reliability standards.
Drive transformation by continuously looking for ways to automate existing SRE processes and increase operational efficiency.
Guide the technical direction for future deployments, advocating for reliability and performance improvements based on industry trends and company objectives.
Lead in incident management and problem management for applications in scope and RCA action items fulfillment/ownership.
Debug production issues across services and levels of the stack and provide primary operational support.
Perform occasional off-hours support.

For immediate consideration please click APPLY to begin the screening process with Alex.