Site Reliability Engineer Data Platform Hybrid

Mississauga, ON, Canada

Job Description


Now is an extremely exciting time to join a newly formed group within Citi. The Institutional Clients Group - Engineering and Architecture Practice (EAP) is responsible for defining and building core architecture and technology strategy for the ICG.

This position will be in Kafka as-a-Service team which sits under Common Platform Engineering (CPE). The CPE is a department within the EAP group whose mission is to provide engineering for common platform capabilities in ICG, engineer solutions that codify the firm\'s data strategy into frameworks & tools and to ensure \'Common Product\' standards are defined to ensure efficient adoption of common components.

We are looking for a Site Reliability Engineer with software engineering background who is passionate about running large scale, multi-tenant distributed data systems for customers that expect a very high level of availability. In this role, you will be responsible for the availability, performance, monitoring, emergency response, and capacity planning of the data systems.

If you love the hum of big data systems, thinking about how to make them run as smoothly as possible, and want to have a big influence on the architecture plus operational design points of the systems, then you will fit right in. Your solutions will be leveraged by tens of thousands of developers across Citi supporting applications used by hundreds of thousands of internal and client users.

What you\xe2\x80\x98ll be doing:

Design & build observability solutions for distributed systems

Contribute to the continuous automation of toil, and drive & evangelize the four key DORA metrics

Establish Service Level Objectives for core services, monitor their Service Level Indicators, and implement error-budget based alerting

Help operational team by building solutions that allow them to identify and resolve health issues of the data systems as quickly as possible

Automate the deployment of infrastructure and application for data systems such as Kafka

Support the rapid growth of the platform, by expanding its strategy to deploy into an OpenShift environment and AWS Cloud environment (EKS/GKE)

Design and implement service improvements for performance & security, relentlessly improve reliability and facilitate effective incident response, mitigation & resolution

Write and review technical documents, including design, requirements, and process documentation

Advocate for a culture of platform automation with obsession for everything as-a-code approach

What we are looking for:

4+ years\xe2\x80\x99 experience in Site Reliability Engineering to create scalable and highly reliable systems

Strong fundamentals in distributed systems design and operation with experience building automation to operate large-scale data systems

Experience designing & implementing observability solutions for data systems to enable a holistic view of system health

Strong understanding of modern site reliability engineering practices and ability to apply them to improve the reliability of systems

Experience creating, deploying, and managing the lifecycle of containerised applications on Kubernetes

Experience in an agile development environment with modern programming languages such as any of the following: Python, Golang, Java, Kotlin, Scala or similar

What gives you an edge:

Experience working with the distributed systems and stream processing solutions, hands on experience with Apache Kafka is highly desirable

Strong grasp of DevSecOps practices and ability to contribute to improving systems reliability, quality, and time-to-market

Experience designing and implementing multiple automated deployment pipelines at both applications and infrastructure level. Ideally, you would have experience with Ansible and Terraform on multiple projects

Experience working with the Hashicorp tool set, specifically Vault for secrets management and Consul for service discovery

Experience deploying applications and infrastructure into the cloud

Citi Canada is an equal opportunity employer. Accordingly, we will make accommodations to respond to the needs of people with disabilities (including, without limitation, physical and mental health disabilities) during the recruitment process and otherwise in accordance with law. Individuals who view themselves as Aboriginals, members of visible minority or racialized communities, and people with disabilities are encouraged to apply.



Job Family Group: Technology



Job Family: Applications Development



Time Type: Full time



Citi is an equal opportunity and affirmative action employer.

Qualified applicants will receive consideration without regard to their race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or status as a protected veteran.

Citigroup Inc. and its subsidiaries ("Citi\xe2\x80\x9d) invite all qualified interested applicants to apply for career opportunities. If you are a person with a disability and need a reasonable accommodation to use our search tools and/or apply for a career opportunity review .

View the " " poster. View the .

View the .

View the

Citigroup

Beware of fraud agents! do not pay money to get a job

MNCJobz.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.


Job Detail

  • Job Id
    JD2188645
  • Industry
    Not mentioned
  • Total Positions
    1
  • Job Type:
    Full Time
  • Salary:
    Not mentioned
  • Employment Status
    Permanent
  • Job Location
    Mississauga, ON, Canada
  • Education
    Not mentioned