AI-First. Future-Driven. Human-Centered.
At OpenText, AI is at the heart of everything we do--powering innovation, transforming work, and empowering digital knowledge workers. We're hiring talent that AI can't replace to help us shape the future of information management. Join us.
Your Impact
As aLead Site Reliability Administrator, you'll guide a globally distributed team responsible for managing and scaling the data services that power our customer-facing SaaS products. These services includeKafka, Elasticsearch, Cassandra, Solr, Redis, and OpenSearch. You'll oversee operations across on-prem and public cloud environments (AWS, Azure, GCP) to ensure systems arereliable, secure, and high-performing.
In this role, you'll providetechnical leadership, mentor team members, and drive improvements inautomation, observability, and reliability practices. You'll influence architecture decisions, optimize operational processes, and help shape the overall strategy for our data services platform.
If you enjoy solving complex challenges, working with distributed systems, and enabling teams to succeed, this role offers the chance to make a big impact while growing your expertise in advanced SRE practices.
What You'll Do
Lead the management and scaling of distributed data services (Kafka, Elasticsearch, Cassandra, Solr, Redis, OpenSearch).
Provide technical guidance and mentorship to SREs and engineering teams.
Design, build, and maintain infrastructure across on-prem and cloud environments (AWS, Azure, GCP).
Develop and maintainInfrastructure as Code (IaC) templates using Terraform and Ansible.
Ensure systems are patched, secure, and compliant with internal standards.
Drive incident response, root cause analysis, and post-mortem improvements for critical services.
Collaborate with engineering teams to design, deploy, and monitor highly available and resilient data platforms.
Lead capacity planning, performance tuning, and optimization efforts.
Create and maintain operational documentation, runbooks, and best practices.
Promote knowledge sharing, training, and continuous improvement within the SRE team.
Influence automation, observability, and reliability strategies across the organization.
Support service requests and ensure SLA/OLA compliance.
Participate in a 24x7 on-call rotation (may include shift work).
What You Need to Succeed
Bachelor's degree in Computer Science or related field.
7+ years of experience in IT with a focus on large-scale enterprise systems.
4+ years managing distributed data platforms (Kafka, Elasticsearch, Cassandra, Solr, Redis); experience with OpenSearch is a plus.
3+ years working with automation tools like Terraform and Ansible.
Strong knowledge of Linux systems and cloud infrastructure (AWS, Azure, GCP).
Excellent troubleshooting, problem-solving, and decision-making skills.
Proven experience leading teams, mentoring, and influencing technical direction.
Strong written and verbal communication skills.
Detail-oriented, proactive, and able to manage multiple priorities in a fast-paced environment.
Familiarity with ITIL principles (certification is a plus).
Experience with observability tools (Prometheus, Grafana, ELK stack) is a bonus.
ONE LAST THING
OpenText's efforts to build an inclusive work environment go beyond simply complying with applicable laws. Our Employment Equity and Diversity Policy provides direction on maintaining a working environment that is inclusive of everyone, regardless of culture, national origin, race, color, gender, gender identification, sexual orientation, family status, age, veteran status, disability, religion, or other basis protected by applicable laws.
If you need assistance and/or a reasonable accommodation due to a disability during the application or recruiting process, please submit a ticket atAsk HR. Our proactive approach fosters collaboration, innovation, and personal growth, enriching OpenText's vibrant workplace.
MNCJobz.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.