Systems Reliability Engineer Lead

hace 2 semanas


Bogotá, Colombia Scotiabank A tiempo completo

Overview Join to apply for the Systems Reliability Engineer Lead role at Scotiabank . We are seeking a capable SRE Lead to collaborate with application, infrastructure and business teams to improve stability, reliability and efficiency of our regional systems using Site Reliability Engineering (SRE) principles. You will work cross-functionally with various teams, lead discussions with technical and business partners, and help standardize operations using IT Service Delivery and IT Service Management practices. Key Accountabilities Collaborate with Application Engineering, Quality, Product and Data Engineering teams to Champion SRE/DevOps culture and practices Work with a team of Reliability Engineers to champion SRE/DevOps practices across the organization Contribute to management of Service Level Objectives with senior development and business leads Refine build, plan and deploy practices to improve stability, reliability, efficiency, repeatability and security; create plans and coordinate with development and business leads to increase service levels, reduce costs, and support delivery velocity Lead troubleshooting of severe incidents with stakeholder communication and problem-solving using best practices Implement, improve and coach service management best practices to enhance overall service delivery Participate in root cause analysis and blameless post-mortems to prevent recurrence Contribute to reliability feature prioritization and design/development of tooling, alerts, and automated responses to address reliability risks Lead in-depth technical and data analyses to gauge service trends and drive improvements Provide proactive technical communication on reliability, stability and efficiency results to senior stakeholders Assist in improving infrastructure automation, efficiency and cost; ensure solutions are automated where possible Define guidelines, tools, practices and processes as an SRE framework for stream-aligned teams Champion adoption of the SRE Framework by development teams, product owners and local SREs Promote the practice across IB regions and onboard local SREs Identify opportunities to improve the platform aligned with defined SLOs Manage the KTLO backlog as a product owner Seek opportunities to reduce toil by automating manual and repetitive tasks Reporting Relationships Primary Manager: Director and Head Systems Reliability and Resiliency SRE Direct Reports: No Aplica Shared Reports: No Aplica Dimensions Collaborate with System Reliability Engineers across Enterprise to ensure availability and stability of systems and applications Ensure customers are not impacted and protect the bank’s reputation Scope focused on Global Wealth Technology with enterprise interactions Education / Experience / Other Information Excellent communication skills (verbal and written) at all organizational levels Ability to clearly communicate incident status via email in business-friendly language Ability to represent the team in meetings with Senior Business/Technology executives 8+ years of IT experience with at least 3 years in a leadership capacity Degree in Computer Science, Engineering, or equivalent ITIL V3/v4 Foundation in ITSM is an asset Experience with ITSM tools (ServiceNow preferred) and strong understanding of SRE and service management Broad knowledge of OS platforms (Linux/UNIX), Networking, Web Systems and IT Ops Experience with large-scale distributed systems Understanding of SOA or microservices architectures Experience with CI tools and Agile environments Strong organizational skills and ability to manage multiple tasks Calm under pressure Proficient in reliability practices, SLOs/SLIs/SLAs, and error budget management Experience with logging/tracing tools (e.g., Splunk, FluentBit, OpenTelemetry) and monitoring tools (e.g., Dynatrace, SpringBoot monitoring) Experience with infrastructure tech (ArgoCD, Terraform, Kubernetes, Helm, Docker, Istio, Cloud GCP) Experience with automation tools (Jenkins, Python, Bitbucket, Bash scripting) Experience in root cause analysis, blameless post-mortems, and playbook/runbook development Experience with non-functional testing tools (OpenAPI, Grafana/K6, Gatling, Postman) Nice-to-have: Chaos engineering tooling (Gremlin, Netflix Chaos Monkey, Litmus Chaos or similar) Location: Bogota, Bogota District, Colombia. ScotiaTech is a ScotiaBank unit based in Bogota that supports technology systems and processes. Applicants should apply online; only shortlisted candidates will be contacted for interviews. #J-18808-Ljbffr



  • Bogotá, Colombia Scotiabank A tiempo completo

    As a member of the International banking Systems Reliability Office team, the System Reliability Engineer (SRE) will collaborate with a team that will work with application teams, infrastructure teams, and business partners to continuously improve the stability, reliability and efficiency of our Regional systems through Site Reliability Engineering (SRE)...


  • Bogotá, Colombia Exari Systems A tiempo completo

    Lead Site Reliability Engineer Coupa makes margins multiply through its community-generated AI and industry-leading total spend management platform for businesses large and small. Coupa AI is informed by trillions of dollars of direct and indirect spend data across a global network of 10M+ buyers and suppliers. We empower you with the ability to predict,...


  • Bogotá, Distrito Capital, Colombia Scotiabank A tiempo completo

    ID de la solicitud: Programa de Referido de Empleados – Probable Pago: $400,000.00 Estamos comprometidos a invertir en nuestros colaboradores y ayudarles a continuar su carrera profesional en ScotiaTech. As a member of the International banking Systems Reliability Office team, the System Reliability Engineer (SRE) will collaborate with a team that will...


  • Bogotá, Colombia Scotiabank A tiempo completo

    **ID de la solicitud**: 237005 **Programa de Referido de Empleados - Probable Pago**: $400,000.00 Estamos comprometidos a invertir en nuestros colaboradores y ayudarles a continuar su carrera profesional en ScotiaTech. You will work cross-functionally amongst a variety of teams and be a core contributor in all significant engineering service or solutions...


  • Bogotá, Bogotá D.E., Colombia Exari Systems A tiempo completo

    Apply for this JobCoupa makes margins multiply through its community-generated AI and industry-leading total spend management platform for businesses large and small. Coupa AI is informed by trillions of dollars of direct and indirect spend data across a global network of 10M+ buyers and suppliers. We empower you with the ability to predict, prescribe, and...

  • Site Reliability Engineer

    hace 2 semanas


    Bogotá, Bogotá D.E., Colombia CBL Solutions A tiempo completo

    Role: Site Reliability EngineerLocation: Medellin or Bogota, ColombiaContract PositionRequirements:8 years of relevant experienceB1 English speakerSkills & Experience:8 years of relevant experienceExpert-level knowledge of distributed systems and cloud infrastructure.Extensive experience with automation and orchestration tools.Deep understanding of...


  • Bogotá, Bogotá D.E., Colombia Masabi A tiempo completo

    Introducing Masabi // At Masabi, we're driving the fare payment revolution, powering the journeys of millions all over the world. We build fare collection platforms that allow riders to seamlessly buy and present tickets for public transport either on their mobile phones, from a ticket machine, or even by tapping their bank card to travel. Our Justride...


  • Bogotá, Colombia Coupa A tiempo completo

    Lead Site Reliability Engineer (10929) Location: Bogota, D.C., Capital District, Colombia Why join Coupa Pioneering Technology: At Coupa, we're at the forefront of innovation, leveraging the latest technology to empower our customers with greater efficiency and visibility in their spend. Collaborative Culture: We value collaboration and teamwork, and our...

  • Site Reliability Engineer

    hace 2 semanas


    Bogotá, Colombia Sur Global A tiempo completo

    Sur Global, Bogota, D.C., Capital District, Colombia Site Reliability Engineer As the Site Reliability Engineer you will support and scale the infrastructure powering our secure, mission-critical SaaS platform. You must be confident in operating and debugging both modern infrastructure (cloud-native, containerized services) and classic Windows production...


  • Bogotá, Colombia Wise Athena A tiempo completo

    **Join Our Team as an SRE!** Wise Athena is looking for a **Site Reliability Engineer (SRE)** to join our dynamic and innovative team! At our company, we’re revolutionizing Revenue Growth Management (RGM) with the power of AI. You will work with a passionate, forward-thinking team. This is a fully remote position. **Key Responsibilities** - **Problem...