Urgent Hiring For Sre/devops Engineer In Toronto, Ontario

Toronto, ON, Canada

Job Description

Location:
Toronto, Ontario (4 days onsite)
Duration:
06 Months
:
Our client is seeking to hire a Senior Site Reliability Engineer for its Application Maintenance and Transformation, Data Services, and Integration team. As a Senior Site Reliability Engineer, you will bring an engineering mindset of ambition, curiosity, and outcome focus to ensure the performance and reliability of our systems. This role calls for a dynamic individual who excels in a collaborative environment, interacting with cross-functional teams to establish best practices for observability, monitoring, logging, alerting, and automation.
Required Skills & Qualifications

  • Bachelor's degree in Computer Science, Electrical or Electronics Engineering or related field, or equivalent experience.
  • Minimum of 3 years of IT experience in software development and/or maintenance or SRE or DevOps Engineering experience.
  • At least 1 year of experience building Java Spring Boot applications and REST API development.
  • Experience with relational databases such as MS-SQL Server, MySQL, MariaDB, and SingleStore or in-memory distributed databases.
  • Proficiency in containerization platforms such as Docker and container orchestration tools like Kubernetes (Azure Kubernetes or OpenShift Kubernetes Service preferred).
  • Solid Git skills with experience working on popular CI tools such as Jenkins or UCD.
  • Experience working on Windows and Linux-based infrastructure.
  • At least 1 year of experience developing cloud-native applications using Java or Python.
  • Experience writing SQL queries and fine-tuning or optimization skills.
  • Experience using centralized logging solutions (Splunk, Elk preferred) and active monitoring systems (Dynatrace, etc.).
  • Experience deploying and operating cloud-native applications in a Private (OpenShift) or public cloud (Azure/AWS preferred).
  • Proactive communication skills around the status of projects/issues in production.
  • Must be a self-starter, motivated, resourceful, and driven to work with cross-functional teams in large enterprises with complex organizational structures to meet business timelines on delivery.
  • Financial Services domain knowledge, preferably in Capital Markets and Wealth Management.
Preferred Skills & Qualifications
  • Experience implementing dashboards to help teams visualize logs, instrumentation, and other data to ensure optimal performance of platform services, infrastructure, and deployed applications (Grafana preferred).
  • Exposure to Datawarehouses like Informatica, Snowflake, or Databricks and Business Intelligence tools like SAP BO or similar.
  • Experience creating runbooks, processes, and test plans around reliability, performance, etc., of infrastructure and applications.
  • Exposure to tools such as PagerDuty, Postman, ServiceNow, SonarQube, NexusIQ, and vault tools.
  • Exposure to event brokers like Kafka or IBM-MQ, Mainframe tools, and environments.
  • Exposure to Industry Disaster recovery test exercises.
Day-to-Day Responsibilities
  • Set vision for SRE product base (monitoring, alerting, self-healing, reliability testing).
  • Lead cross-functional collaborations to define and implement best practices for monitoring, logging, and incident response, driving a proactive stance on system health.
  • Function as portfolio SME (Subject Matter Expert) - understand and document common components, core functionalities, and infrastructure of supported applications.
  • Actively participate in deploying software applications, automation tools, and IT infrastructure.
  • Work closely with development teams to understand code changes and their impact on the production environment, ensuring that new releases meet our reliability standards.
  • Drive transformation by continuously looking for ways to automate existing SRE processes and increase operational efficiency.
  • Guide the technical direction for future deployments, advocating for reliability and performance improvements based on industry trends and company objectives.
  • Lead in incident management and problem management for applications in scope and RCA action items fulfillment/ownership.
  • Debug production issues across services and levels of the stack and provide primary operational support.
  • Perform occasional off-hours support.
For immediate consideration please click APPLY to begin the screening process with Alex.

Skills Required

Beware of fraud agents! do not pay money to get a job

MNCJobz.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.


Related Jobs

Job Detail

  • Job Id
    JD2944882
  • Industry
    Not mentioned
  • Total Positions
    1
  • Job Type:
    Full Time
  • Salary:
    Not mentioned
  • Employment Status
    Permanent
  • Job Location
    Toronto, ON, Canada
  • Education
    Not mentioned