Site Reliability Engineer

hace 16 horas


Colombia Huila Datavail A tiempo completo

You will own reliability for core services across multiple clouds, drive automation, and mentor more junior engineers. You will partner with developer teams to embed resilience into feature delivery.

**Responsibilities**:

- Define and maintain SLIs/SLOs, monitor alignment and error budget usage
- Lead incident response and postmortems, implement corrective measures
- Automate operations tasks via tooling (e.g. auto-remediation, scaling rules)
Build, improve, and maintain CI/CD pipelines, canary deployments, blue/green strategies
- Lead technical discussions with customers to align on reliability, scalability, and performance requirements
- Drive continuous platform improvements across the service lifecycle, including architecture, monitoring, and operational processes
- Implement and extend observability systems (metrics, tracing, log aggregation)
- Optimize performance and cost by tuning cloud services, autoscaling, resource rightsizing
- Design, deploy, and operate containerized workloads using Docker and Kubernetes in production environments
- Collaborate with dev teams to integrate resilience patterns (circuit breakers, bulkheading)
- Participate in architecture discussions around high availability, disaster recovery
- Mentor mid and junior SREs; conduct reliability design reviews

Must-have Qualifications
- 5-8 years of experience in a reliability or operations role
- Cloud-agnostic certification**: Terraform Associate, Certified Kubernetes Administrator (CKA), or SRE Foundation
- Cloud provider certification**: Professional-level certification in AWS (Solutions Architect), Azure (Solutions Architect Expert), GCP (Professional Cloud Architect), or Oracle Cloud (Architect Professional)
- Solid coding skills (Python, Go, or equivalent)
- Experience with IaC, CI/CD pipelines, and monitoring/observability stacks (Prometheus, Grafana, OpenTelemetry, ELK)
- Comfortable with observability stacks (Prometheus, Grafana, OpenTelemetry, ELK, Jaeger)
- Experience working in distributed systems and production scale services

Nice-to-have Skills
- Exposure to multi-cloud data replication or cross-cloud networks
- Experience with chaos engineering or fault injection



  • Colombia, Huila Groupon A tiempo completo

    Groupon is a marketplace where customers discover new experiences and services everyday and local businesses thrive. To date we have worked with over a million merchant partners worldwide, connecting over 16 million customers with deals across various categories. In a world often dominated by e-commerce giants, we stand out as one of the few platforms...

  • Azure DevOps Engineer

    hace 1 semana


    Colombia Axiom Path Inc A tiempo completo

    **Azure DevOps Engineer / Site Reliability Engineer** **Contract, 100% REMOTE** - In this role, you will leverage your DevOps expertise to design, automate, and streamline the software development lifecycle while playing a crucial role in maintaining website uptime. This role requires a strong ability to handle emergencies, troubleshoot website outages, and...


  • Colombia, Huila Datavail A tiempo completo

    At least 2 years of hands-on experience with AWS - We require at least one AWS associate level certification. - Able to contribute through CloudFormation / Terraform - Good knowledge of AWS core services related to Infrastructure (EC2, ECS, EKS, RDS, EBS etc.), Networking (VPC, Network Security Groups, Peering, Transit Gateway, site-to-site VPN etc.),...

  • Reliability Engineer

    hace 1 semana


    Colombia, Huila Baker Hughes A tiempo completo

    Role Description **Reliability Engineer** **Summary** Can work with limited supervision on assigned tasks with standard techniques to build on basic knowledge and develop skills in specific practice areas. Interacts with clients and client organisations and has an understanding of how maintenance management is executed. Understands project management...


  • Colombia Felix Technologies, Inc. A tiempo completo

    About Us At Félix, we're building the financial ecosystem for Latin immigrants in the U.S., starting with a revolution in remittances. Our core product is an AI-powered chatbot built on WhatsApp, allowing our users to send money home as easily as sending a text message. We leverage cutting-edge technology like AI, blockchain, and stablecoins to make...


  • Colombia, Huila Groupon A tiempo completo

    Groupon is a marketplace where customers discover new experiences and services everyday and local businesses thrive. To date we have worked with over a million merchant partners worldwide, connecting over 16 million customers with deals across various categories. In a world often dominated by e-commerce giants, we stand out as one of the few platforms...


  • Colombia MAS Global Consulting A tiempo completo

    Who We AreAt MAS Global Consulting, we bring together diverse engineering talent and meaningful work opportunities with global clients who value innovation, quality, and people-first collaboration. Our mission is to help organizations build scalable, modern, and resilient platforms while enabling our consultants to grow in their careers.We are proud to...


  • Colombia Kyndryl Colombia SAS A tiempo completo

    **Why Kyndryl** Kyndryl is a market leader that thinks and acts like a start-up. We design, build, manage, and modernize the mission-critical technology systems that the world depends on every day. So why work at Kyndryl? We are always moving forward - always pushing ourselves to go further in our efforts to build a more equitable, inclusive world for our...


  • Colombia Yuxi Global A tiempo completo

    Company Description Yuxi Global is an American company with high functional teams across Latin America. We stay updated with the most modern, edge practices and technologies. Our teams are versatile, adaptable and have expertise in a wide range of programming languages, databases and frameworks. This is your invitation to someone who loves working with the...


  • Colombia Second Spectrum A tiempo completo

    **QA & Reliability Engineer (Remote)**: at Second Spectrum Medellín, Antioquia, Colombia **Second Spectrum** **is a Sports Emmy-winning data & tech company** that is building _the next way of seeing sports_ - by capturing and producing the highest quality data and innovative content for many of the world’s largest leagues and media partners, such as...