in production.
Develop automation tools to reduce manual operational tasks.
Monitor system performance, reliability, and uptime using metrics and SLIs/SLOs.
Implement and maintain
CI/CD pipelines
for rapid and reliable software delivery.
Troubleshoot production issues and perform
root cause analysis (RCA)
.
Manage cloud infrastructure (AWS/Azure/GCP) with a focus on resilience and efficiency.
Build and improve
, including on-call rotations.
Work with development teams to adopt SRE best practices such as error budgets and capacity planning.
Ensure security, compliance, and adherence to best practices across all systems.
Improve system architecture for better performance, cost optimization, and scalability.
Required Skills & Qualifications
Solid experience with
Linux/Unix systems administration
.
Strong programming/scripting skills:
Python, Go, Bash
, or Ruby.
Hands-on expertise with
cloud platforms
(AWS, Azure, or GCP).
Experience with
containerization and orchestration
:
Docker, Kubernetes (EKS, AKS, GKE)
Strong understanding of
CI/CD pipelines
and tooling (Jenkins, GitHub Actions, GitLab CI, ArgoCD).
Knowledge of
infrastructure as code (IaC)
:
Terraform, CloudFormation, Ansible
Strong familiarity with
monitoring, logging, and alerting
systems.
Experience with distributed systems, load balancing, caching, and performance tuning.
Strong troubleshooting and debugging skills across the stack.
Job Type: Full-time
Pay: $61,701.83-$223,541.88 per year
Beware of fraud agents! do not pay money to get a job
MNCJobz.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.