Site Reliability Engineer
hace 16 horas
You will own reliability for core services across multiple clouds, drive automation, and mentor more junior engineers. You will partner with developer teams to embed resilience into feature delivery.
**Responsibilities**:
- Define and maintain SLIs/SLOs, monitor alignment and error budget usage
- Lead incident response and postmortems, implement corrective measures
- Automate operations tasks via tooling (e.g. auto-remediation, scaling rules)
Build, improve, and maintain CI/CD pipelines, canary deployments, blue/green strategies
- Lead technical discussions with customers to align on reliability, scalability, and performance requirements
- Drive continuous platform improvements across the service lifecycle, including architecture, monitoring, and operational processes
- Implement and extend observability systems (metrics, tracing, log aggregation)
- Optimize performance and cost by tuning cloud services, autoscaling, resource rightsizing
- Design, deploy, and operate containerized workloads using Docker and Kubernetes in production environments
- Collaborate with dev teams to integrate resilience patterns (circuit breakers, bulkheading)
- Participate in architecture discussions around high availability, disaster recovery
- Mentor mid and junior SREs; conduct reliability design reviews
Must-have Qualifications
- 5-8 years of experience in a reliability or operations role
- Cloud-agnostic certification**: Terraform Associate, Certified Kubernetes Administrator (CKA), or SRE Foundation
- Cloud provider certification**: Professional-level certification in AWS (Solutions Architect), Azure (Solutions Architect Expert), GCP (Professional Cloud Architect), or Oracle Cloud (Architect Professional)
- Solid coding skills (Python, Go, or equivalent)
- Experience with IaC, CI/CD pipelines, and monitoring/observability stacks (Prometheus, Grafana, OpenTelemetry, ELK)
- Comfortable with observability stacks (Prometheus, Grafana, OpenTelemetry, ELK, Jaeger)
- Experience working in distributed systems and production scale services
Nice-to-have Skills
- Exposure to multi-cloud data replication or cross-cloud networks
- Experience with chaos engineering or fault injection
-
Principal Site Reliability Engineer
hace 4 días
Colombia, Huila Groupon A tiempo completoGroupon is a marketplace where customers discover new experiences and services everyday and local businesses thrive. To date we have worked with over a million merchant partners worldwide, connecting over 16 million customers with deals across various categories. In a world often dominated by e-commerce giants, we stand out as one of the few platforms...
-
Azure DevOps Engineer
hace 1 semana
Colombia Axiom Path Inc A tiempo completo**Azure DevOps Engineer / Site Reliability Engineer** **Contract, 100% REMOTE** - In this role, you will leverage your DevOps expertise to design, automate, and streamline the software development lifecycle while playing a crucial role in maintaining website uptime. This role requires a strong ability to handle emergencies, troubleshoot website outages, and...
-
Site Reliability Engineer
hace 2 días
Colombia, Huila Datavail A tiempo completoAt least 2 years of hands-on experience with AWS - We require at least one AWS associate level certification. - Able to contribute through CloudFormation / Terraform - Good knowledge of AWS core services related to Infrastructure (EC2, ECS, EKS, RDS, EBS etc.), Networking (VPC, Network Security Groups, Peering, Transit Gateway, site-to-site VPN etc.),...
-
Reliability Engineer
hace 1 semana
Colombia, Huila Baker Hughes A tiempo completoRole Description **Reliability Engineer** **Summary** Can work with limited supervision on assigned tasks with standard techniques to build on basic knowledge and develop skills in specific practice areas. Interacts with clients and client organisations and has an understanding of how maintenance management is executed. Understands project management...
-
Site Reliability Engineer
hace 6 días
Colombia Felix Technologies, Inc. A tiempo completoAbout Us At Félix, we're building the financial ecosystem for Latin immigrants in the U.S., starting with a revolution in remittances. Our core product is an AI-powered chatbot built on WhatsApp, allowing our users to send money home as easily as sending a text message. We leverage cutting-edge technology like AI, blockchain, and stablecoins to make...
-
Senior Site Reliability Engineer
hace 4 días
Colombia, Huila Groupon A tiempo completoGroupon is a marketplace where customers discover new experiences and services everyday and local businesses thrive. To date we have worked with over a million merchant partners worldwide, connecting over 16 million customers with deals across various categories. In a world often dominated by e-commerce giants, we stand out as one of the few platforms...
-
Site Reliability Engineer
hace 4 días
Colombia MAS Global Consulting A tiempo completoWho We AreAt MAS Global Consulting, we bring together diverse engineering talent and meaningful work opportunities with global clients who value innovation, quality, and people-first collaboration. Our mission is to help organizations build scalable, modern, and resilient platforms while enabling our consultants to grow in their careers.We are proud to...
-
Infrastructure Services Site Reliability Engineer
hace 2 semanas
Colombia Kyndryl Colombia SAS A tiempo completo**Why Kyndryl** Kyndryl is a market leader that thinks and acts like a start-up. We design, build, manage, and modernize the mission-critical technology systems that the world depends on every day. So why work at Kyndryl? We are always moving forward - always pushing ourselves to go further in our efforts to build a more equitable, inclusive world for our...
-
Senior Site Reliability Engineer
hace 2 semanas
Colombia Yuxi Global A tiempo completoCompany Description Yuxi Global is an American company with high functional teams across Latin America. We stay updated with the most modern, edge practices and technologies. Our teams are versatile, adaptable and have expertise in a wide range of programming languages, databases and frameworks. This is your invitation to someone who loves working with the...
-
QA & Reliability Engineer (Remote)
hace 2 días
Colombia Second Spectrum A tiempo completo**QA & Reliability Engineer (Remote)**: at Second Spectrum Medellín, Antioquia, Colombia **Second Spectrum** **is a Sports Emmy-winning data & tech company** that is building _the next way of seeing sports_ - by capturing and producing the highest quality data and innovative content for many of the world’s largest leagues and media partners, such as...