Site Reliability Architect

Calgary, AB, Canada

Job Description

Introduction
IBM Application Consultants work directly with our clients on key initiatives. You will have the opportunity to build an in-depth understanding of their business issues and implement organizational strategies that drive adoption of change. We are looking for experts who can build credibility and trust with our clients and provide knowledge that addresses individual and unique business needs.

Your Role and Responsibilities
As an IBM Hybrid Cloud Site Reliability Architect, you will help our clients transform and automate their existing and evolving operational landscapes to enable superior reliability and agility across their Enterprises. You will:

  • Develop effective tooling, alerts, and response to both identify and address reliability risks including automatic problem detection and mitigation.
  • Contribute to the design and build of new and evolving systems, including self-healing system
  • Troubleshoot and improve large-scale distributed systems
  • Provide advice to teams handling major Incident responses
  • Engage with product teams to fix production outages and carry forward action items to improve ongoing reliability
  • Create automated processes based on run books
You will have access to the latest education, tools and technology, and a limitless career path with the world's technology leader. You will learn latest technologies and tools for auto-healing and auto-resilient systems, FinOps, AIOps, Edge computing, and DataOps. Come to IBM and make a global impact!

This role can be performed from Alberta, Ontario, and New Brunswick

Required Technical and Professional Expertise
  • Solid experience and expertise in Hybrid Cloud platform development and operations with one or more cloud platforms (AWS, Azure, GCP, IBM Cloud) and Container orchestration platform technologies such as Kubernetes and OpenShift/DevOps Mindset.
  • Min 5 years proven experience solving difficult engineering problems through hands-on intervention
  • Good Software engineering skills ideally with experience in Python, Go and/or Java
  • Understanding of Linux system internals, are familiar with the TCP/IP stack, network routing and load balancing
  • Understanding of Security, Certificates, TLS
  • Approach troubleshooting systematically and have a deep sense of ownership for whatever you work on.
  • Fair understanding of mathematical and statistical models to assess trends.
  • Ability to root cause sources of instability in a high-traffic, distributed system
  • Experience with configuration and troubleshooting of Linux, Java/Scala, Docker systems
  • Understanding of large-scale complex systems from a reliability perspective
  • Passion for resolving reliability issues and identify strategies to mitigate going forward
  • Willingness to work in an ever-changing environment
  • You are lazy - you are passioned about automation and innovations that improve productivity
  • Knowledge of microservices, event-based technologies, API design, web servers, relational and NoSQL databases.
  • System Thinking end-to-end -Broad understanding of enterprise architectures and complex (backend) systems (understand more than the component itself)
Preferred Technical and Professional Expertise
  • Experience with multiple technologies such as: Java, Go, Docker, Kubernetes, Terraform, Python/Ansible, IBM Cloud/AWS/Azure/Google, Ruby, Redis, PostgreSQL, Nginx, , Elasticsearch, Kibana, Logstash, Telegraf, Sensu, Debian, PowerDNS, BIND, InfluxDB, SSL (OpenSSL), Grafana, haproxy, keepalived, Solr, Scala, Kafka, JVMs, RDBMS, NoSQL, Tomcat, WebSphere, WebLogic
  • Experience in resolving reliability issues and identify strategies to mitigate going forward.
  • Knowledge of GitOps, AIOps, FinOps practices.
  • Experience building applications on cloud infrastructure, e.g., private clouds, public clouds
  • Experience working in an agile team
  • Experience with "build to manage" principles
Must have the ability to work in Canada without sponsorship.

Beware of fraud agents! do not pay money to get a job

MNCJobz.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.


Related Jobs

Job Detail

  • Job Id
    JD2043689
  • Industry
    Not mentioned
  • Total Positions
    1
  • Job Type:
    Full Time
  • Salary:
    Not mentioned
  • Employment Status
    Permanent
  • Job Location
    Calgary, AB, Canada
  • Education
    Not mentioned