Site Reliability Engineer

hace 2 semanas


WorkFromHome, Colombia Michael Page Colombia A tiempo completo

Build reliable, scalable systems through automation and engineering. Improve service stability using SLOs, monitoring and incident response. Acerca de nuestro cliente A U.S.-based e-commerce organization specializing in personalized products, operating high-volume digital platforms supported by global teams. The company emphasizes technology-driven operations, strong customer experience, and scalable infrastructure to support rapid growth and large production capacity. Descripción Reliability & Performance Define and manage SLIs, SLOs, and error budgets. Improve system reliability, scalability, and resilience. Lead reliability reviews and prevent incidents proactively. Observability & Monitoring Build and maintain monitoring, logging, and alerting. Ensure actionable alerts and effective dashboards. Implement distributed tracing. Automation & Tooling Automate operational tasks to reduce toil. Build tools for reliability and automated remediation. CI/CD & Deployments Improve CI/CD pipelines for safe deployments. Implement canary, blue/green, and rollback strategies. Ensure production readiness. Incident Management Join on-call rotations. Lead incident response and post-incident reviews. Promote a blameless culture. Cloud & Infrastructure Manage AWS/Azure cloud environments. Work with containers, serverless, and event-driven systems. Ensure scalable, secure, and cost-efficient infrastructure. Infrastructure as Code Build and manage infrastructure using Terraform. Maintain automated and consistent provisioning. Security & Compliance Embed security in CI/CD pipelines. Support audits and compliance activities. Perfil buscado (h/m) 4+ years of experience in SRE, DevOps, or Platform Engineering. Strong software engineering mindset and programming/scripting skills (Python, Go, Bash, etc.). Hands‑on experience with AWS or Azure cloud environments. Solid understanding of distributed systems and cloud-native architectures. Proficiency with Terraform and Infrastructure as Code practices. Experience defining and managing SLIs, SLOs, and error budgets. Strong background in observability: monitoring, logging, alerting, and tracing. Experience improving CI/CD pipelines and deployment strategies. Ability to lead incident response and conduct blameless postmortems. Familiarity with automation, reliability tooling, and reducing operational toil. Strong analytical and problem‑solving skills. Excellent communication skills and ability to partner with engineering teams. Proactive, detail-oriented, and focused on continuous improvement. Advanced English (B2-C1) required for daily communication with international teams. Qué Ofrecemos 100% remote role from Colombia. Undefined contract through Michael Page Colombia. Exposure to modern SRE practices, automation frameworks, resilience engineering, and cloud-native tooling. Professional growth through complex technical challenges and continuous learning. Chance to work with global teams and cutting-edge cloud technologies across AWS and Azure. #J-18808-Ljbffr



  • WorkFromHome, Colombia Masabi A tiempo completo

    A leading fintech company is seeking a Lead Site Reliability Engineer to enhance system reliability. This remote role in Colombia involves designing reliable systems, contributing to incident response, and mentoring teams. Candidates should have substantial SRE or DevOps experience, particularly in AWS and infrastructure automation. A supportive and...


  • WorkFromHome, Colombia AgileEngine A tiempo completo

    Join to apply for the Site Reliability Engineer ID45689 role at AgileEngine AgileEngine is an Inc. 5000 company that creates award‑winning software for Fortune 500 brands and trailblazing startups across 17+ industries. We rank among the leaders in areas like application development and AI/ML, and our people‑first culture has earned us multiple Best...


  • WorkFromHome, Colombia Masabi A tiempo completo

    Lead Site Reliability Engineer Introducing Masabi // At Masabi, we’re driving the fare payment revolution, powering the journeys of millions all over the world. We build fare collection platforms that allow riders to seamlessly buy and present tickets for public transport either on their mobile phones, from a ticket machine, or even by tapping their bank...


  • WorkFromHome, Colombia AgileEngine A tiempo completo

    A leading software development company in Colombia is seeking a Site Reliability Engineer to shape secure and scalable cloud-native systems. You will design resilient AWS infrastructure, lead CI/CD pipeline development, and mentor teams in DevSecOps practices. This role emphasizes innovation and collaboration with a focus on automation and observability....

  • Site Reliability Engineer

    hace 2 semanas


    WorkFromHome, Colombia Canonical A tiempo completo

    Site Reliability Engineer Canonical is a leading provider of open‑source software and operating systems to the global enterprise and technology markets. Our platform, Ubuntu, is widely used in breakthrough enterprise initiatives such as public cloud, data science, AI, engineering innovation, and IoT. With customers that include the world's leading public...


  • WorkFromHome, Colombia BairesDev A tiempo completo

    A technology solutions company is seeking a Site Reliability Engineer to manage cloud infrastructure and automate deployments. Responsibilities include ensuring service availability, implementing CI/CD pipelines, and troubleshooting issues. Ideal candidates have experience with Kubernetes, Ansible, and CI/CD tools, along with advanced English skills. Enjoy a...


  • WorkFromHome, Colombia BairesDev A tiempo completo

    A leading technology solutions provider is hiring a Site Reliability Engineer in Cartagena de Indias, Colombia. The ideal candidate will manage cloud infrastructure and implement CI/CD pipelines to automate deployments. The role offers fully remote work with flexible hours, competitive compensation, and a supportive environment for career growth and...


  • WorkFromHome, Colombia Patagonian A tiempo completo

    Site Reliability Engineer - Sr Looking for a Senior SRE engineer to join a team that works on a distributed architecture, spanning physical machines and virtualizing on‑prem host/cloud computing. Engineer will provide support centralizing DevOps and help existing teams adopt best practices within our environment. Candidate will manage complex tasks that...


  • WorkFromHome, Colombia Canonical A tiempo completo

    Senior Site Reliability Engineer – Canonical – Bogotá, D.C., Colombia Canonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. Our platform, Ubuntu, is widely used in breakthrough enterprise initiatives such as public cloud, data science, AI, engineering innovation and IoT. Our...


  • WorkFromHome, Colombia Félix A tiempo completo

    Overview At Félix, we're building the financial ecosystem for Latin immigrants in the U.S., starting with a revolution in remittances. Our core product is an AI-powered chatbot built on WhatsApp, allowing our users to send money home as easily as sending a text message. We leverage cutting-edge technology like AI, blockchain, and stablecoins to make...