Systems Reliability Engineer Lead

hace 2 semanas


Bogotá Distrito Capital, Colombia Scotiabank A tiempo completo

ID de la solicitud: Programa de Referido de Empleados – Probable Pago: $400,000.00 Estamos comprometidos a invertir en nuestros colaboradores y ayudarles a continuar su carrera profesional en ScotiaTech. As a member of the International banking Systems Reliability Office team, the System Reliability Engineer (SRE) will collaborate with a team that will work with application teams, infrastructure teams, and business partners to continuously improve the stability, reliability and efficiency of our Regional systems through Site Reliability Engineering (SRE) based principles and practices that will include continuous people, process and technology (“automating all the things”) enhancements in support of our rapidly changing technology product portfolio. You will work cross-functionally amongst a variety of teams and be a core contributor in all significant engineering service or solutions delivered to Systems Reliability Office stakeholders. You will also have an understanding ‘what could go wrong’, solve complex problems and have a flare for communicating and leading discussions with technical and business partners. You will work directly with Application Engineering teams to both maintain and operate our existing technology and build our next generation of technologies. You will leverage your deep experience with IT Service Delivery and IT Service Management to standardize and improve operations, analysis and service levels across the International Banking portfolio. Key Accountabilities Work in collaboration with Application Engineering, Quality, Product and Data Engineering teams to Champion SRE/ DevOps culture and practices Collaborate with a team of Reliability Engineers working closely with software development, Quality, Product and Data Engineering teams as a Champion of SRE/ DevOps culture and practices Contribute to management of Service Level Objectives with senior development and business leads Contribute to initiatives to continuously refine our build, plan and deploy practices for improved stability, reliability, efficiency, repeatability and security. You’ll create plans, collaborate with other SROs and DevOps team members - coordinating activity with development and business leads to increase service levels, lower costs, and support delivery velocity objectives Work closely with Development and operations teams to lead troubleshooting of our most severe incidents – leading senior stakeholder communication, driving problem-solving (e.g., log analysis, non-invasive tests) and debugging with best practice techniques Working with application teams, implement, improve and coach service management best practices to improve overall service delivery Participate in continuous improvement and execution of quality and timely major incident root cause analysis and blameless post-mortem activities to ensure we take action to avoid similar problems in the future. Contribute to prioritization of reliability features and contribute to the design, development and delivery of effective tooling, alerts, and automated responses to identify and address reliability risks. Lead in-depth technical and data analysis to gauge service trends and drive improvements. Contribute to proactive technical communication of reliability, stability and efficiency results (based on Service Level Objectives), service health (via dashboards) key reliability risks and issues to senior business and technology stakeholders – to prioritize activity (based on trend analysis) and direct investment and action. Assisting in improving infrastructure automation, efficiency, and cost Ensure solutions are automated where possible while improving operational efficiency, reducing operating risk and delivering quality services. Definition of guidelines, tools, good practices and processes as an SRE Framework that should be implemented by the stream aligned teams. Responsible for the adoption of the SRE Framework by development teams (stream aligned), product owners team and local SREs. Champion/ambassador of the practice across the IB regions. Guarantee that local SREs are onboard on the practice. Constantly seeks opportunities to improve the platform aligned with the defined SLOs. Manages the KTLO backlog acting as a product owner. Constantly seeks new “toiling reduction opportunities”, automatizing the manual and repetitive labor as much as possible. Reporting Relationships Primary Manager: Director and Head Systems Reliability and Resiliency SREDirect Reports: No AplicaShared Reports: No Aplica Dimensions Working closely with System Reliability Engineers across Enterprise to ensure availability and stability of systems and application. Ensuring customers are not impacted and the bank’s reputational brand is not negatively affected. Scope is focused on Global Wealth Technology and will extend to enterprise interactions. Education / Experience / Other Information Excellent communication (both verbal and written). The ability to communicate confidently and clearly on conference calls, in meetings, via email, etc. at all levels of the organization is essential Ability to quickly and clearly communicate incident status via email in business friendly language Ability to represent the team in meetings and presentations that include Senior Business / Technology executives 8+ years’ experience in IT with at least 3 years in a leadership capacity – directly or indirectly Degree in Computer Science, Engineering, or equivalent experience. ITIL V3/v4 Foundation Cert. in ITSM an asset Experience with ITSM tools (ServiceNow, a plus) with strong understanding of SRE and service management principles Well-rounded broad knowledge of OS platforms (Linux/UNIX), Networking, Web Systems and IT Ops Experience working with large-scale distributed systems Advanced understanding of SOA or microservices architecture concepts Advanced understanding of continuous integration systems and toolsets Experience working in an Agile environment Strong organizational skills and the ability to effectively manage multiple tasks simultaneously Capable of working in a complex and fast paced environment Ability to maintain calm during stressful situations. Proeficient in Reliability practices, definition and follow up on SLOs, SLIs and SLAs, Error Budget management. Experience on Logging and Tracing tools, Splunk, FluentBit, OpenTelemetry. Experience on Monitoring tools, Dynatrace, SpringBoot Microservices monitoring. Experience in Infrastructure technology, ArgoCD, Terraform, Kubernetes, Helm, Docker, Istio, Cloud GCP. Experience in automation tools and technology, Jenkins, Python, BitBucket, Bash scripting. Experience in Log analysis, Root Cause Analysis, Blameless PostMortem, and Playbook/Runbook formulation. Experience in tools and practices for Non Functional Testing, OpenAPI specification, Graphana K6, Gatling, Postman. Nice to have, experience on Chaos Enginering tools, Gremlin, Netflix Chaos Monkey, Litmus Chaos or similar. Ubicación(s): Colombia : Bogota : Bogota ScotiaTech es una unidad de negocio de ScotiaGBS, un grupo de empresas de Scotiabank, ubicado en Bogotá, Colombia. ScotiaTech fue creado para apoyar diversos sistemas y procesos tecnológicos del Banco.Ofrecemos un entorno de trabajo inclusivo y positivo, además de ventajas competitivas. En ScotiaTech, valoramos las habilidades y experiencias únicas que cada persona aporta y nos hemos comprometido a crear y mantener un entorno inclusivo y accesible para todos. Los candidatos deben postularse directamente en línea si desean ser tomados en cuenta para este puesto. Agradecemos a todos los candidatos por su interés en esta oportunidad profesional en ScotiaTech; sin embargo, solo contactaremos a quienes hayan sido seleccionados para una entrevista. #J-18808-Ljbffr



  • Bogotá, Colombia Scotiabank A tiempo completo

    Overview Join to apply for the Systems Reliability Engineer Lead role at Scotiabank . We are seeking a capable SRE Lead to collaborate with application, infrastructure and business teams to improve stability, reliability and efficiency of our regional systems using Site Reliability Engineering (SRE) principles. You will work cross-functionally with various...


  • Bogotá, Colombia Scotiabank A tiempo completo

    As a member of the International banking Systems Reliability Office team, the System Reliability Engineer (SRE) will collaborate with a team that will work with application teams, infrastructure teams, and business partners to continuously improve the stability, reliability and efficiency of our Regional systems through Site Reliability Engineering (SRE)...


  • Bogotá, Colombia Exari Systems A tiempo completo

    Lead Site Reliability Engineer Coupa makes margins multiply through its community-generated AI and industry-leading total spend management platform for businesses large and small. Coupa AI is informed by trillions of dollars of direct and indirect spend data across a global network of 10M+ buyers and suppliers. We empower you with the ability to predict,...


  • Bogotá, Bogotá D.E., Colombia Scotiabank A tiempo completo

    ID de la solicitud:237005Programa de Referido de Empleados – Probable Pago:$400,000.00Estamos comprometidos a invertir en nuestros colaboradores y ayudarles a continuar su carrera profesional en ScotiaTech.As a member of the International banking Systems Reliability Office team, the System Reliability Engineer (SRE) will collaborate with a team that will...


  • Bogotá, Colombia Scotiabank A tiempo completo

    **ID de la solicitud**: 237005 **Programa de Referido de Empleados - Probable Pago**: $400,000.00 Estamos comprometidos a invertir en nuestros colaboradores y ayudarles a continuar su carrera profesional en ScotiaTech. You will work cross-functionally amongst a variety of teams and be a core contributor in all significant engineering service or solutions...


  • Bogotá, Distrito Capital, Colombia Coupa Software A tiempo completo

    Coupa makes margins multiply through its community-generated AI and industry-leading total spend management platform for businesses large and small. Coupa AI is informed by trillions of dollars of direct and indirect spend data across a global network of 10M+ buyers and suppliers. We empower you with the ability to predict, prescribe, and automate smarter,...


  • Bogotá, Bogotá D.E., Colombia Exari Systems A tiempo completo

    Apply for this JobCoupa makes margins multiply through its community-generated AI and industry-leading total spend management platform for businesses large and small. Coupa AI is informed by trillions of dollars of direct and indirect spend data across a global network of 10M+ buyers and suppliers. We empower you with the ability to predict, prescribe, and...

  • Site Reliability Engineer

    hace 2 semanas


    Bogotá, Bogotá D.E., Colombia CBL Solutions A tiempo completo

    Role: Site Reliability EngineerLocation: Medellin or Bogota, ColombiaContract PositionRequirements:8 years of relevant experienceB1 English speakerSkills & Experience:8 years of relevant experienceExpert-level knowledge of distributed systems and cloud infrastructure.Extensive experience with automation and orchestration tools.Deep understanding of...


  • Bogotá, Bogotá D.E., Colombia Masabi A tiempo completo

    Introducing Masabi // At Masabi, we're driving the fare payment revolution, powering the journeys of millions all over the world. We build fare collection platforms that allow riders to seamlessly buy and present tickets for public transport either on their mobile phones, from a ticket machine, or even by tapping their bank card to travel. Our Justride...


  • Bogotá, Colombia Coupa A tiempo completo

    Lead Site Reliability Engineer (10929) Location: Bogota, D.C., Capital District, Colombia Why join Coupa Pioneering Technology: At Coupa, we're at the forefront of innovation, leveraging the latest technology to empower our customers with greater efficiency and visibility in their spend. Collaborative Culture: We value collaboration and teamwork, and our...