Site Reliability Engineer

hace 6 días

Colombia Huila Datavail A tiempo completo

You will own reliability for core services across multiple clouds, drive automation, and mentor more junior engineers. You will partner with developer teams to embed resilience into feature delivery.

**Responsibilities**:

- Define and maintain SLIs/SLOs, monitor alignment and error budget usage
- Lead incident response and postmortems, implement corrective measures
- Automate operations tasks via tooling (e.g. auto-remediation, scaling rules)
- Build, improve, and maintain CI/CD pipelines, canary deployments, blue/green strategies
- Lead technical discussions with customers to align on reliability, scalability, and performance requirements
- Drive continuous platform improvements across the service lifecycle, including architecture, monitoring, and operational processes
- Implement and extend observability systems (metrics, tracing, log aggregation)
- Optimize performance and cost by tuning cloud services, autoscaling, resource rightsizing
- Design, deploy, and operate containerized workloads using Docker and Kubernetes in production environments
- Collaborate with dev teams to integrate resilience patterns (circuit breakers, bulkheading)
- Participate in architecture discussions around high availability, disaster recovery
- Mentor mid and junior SREs; conduct reliability design reviews

Must-have Qualifications
- 5-8 years of experience in a reliability or operations role
- Cloud-agnostic certification: Terraform Associate, Certified Kubernetes Administrator (CKA), or SRE Foundation
- Cloud provider certification: Professional-level certification in AWS (Solutions Architect), Azure (Solutions Architect Expert), GCP (Professional Cloud Architect), or Oracle Cloud (Architect Professional)
- Solid coding skills (Python, Go, or equivalent)
- Experience with IaC, CI/CD pipelines, and monitoring/observability stacks (Prometheus, Grafana, OpenTelemetry, ELK)
- Comfortable with observability stacks (Prometheus, Grafana, OpenTelemetry, ELK, Jaeger)
- Experience working in distributed systems and production scale services

Nice-to-have Skills
- Exposure to multi-cloud data replication or cross-cloud networks
- Experience with chaos engineering or fault injection

Remote - Edge Site Reliability Engineer (Sre)

hace 6 días

Colombia, Huila GSB Solutions A tiempo completo

Important company requires; **Edge Site Reliability Engineer (SRE) - Remote in Colombia** **Main Activities / Responsibilities**: - Guarantee the general system uptime, focus on availability to comply with the defined SLA, SLO and SLI. right SLI and SLO and identifying significant projects that result in substantial cost savings or revenues. - Spend
Senior Site Reliability Engineer

hace 5 minutos

Colombia MAS Global Consulting A tiempo completo

Who We AreAt MAS Global Consulting, we are a premium digital engineering partner delivering technology solutions to some of the world's most innovative companies — from high-growth startups to Fortune 500 enterprises.With a people-first culture and a commitment to excellence, we combine nearshore talent, agile delivery, and technical depth to build...
Senior Site Reliability Engineer

hace 2 semanas

Colombia Mas Global Consulting Llc A tiempo completo

Senior Site Reliability Engineer (SRE) | LATAM At MAS Global Consulting , we are a premium digital engineering partner delivering technology solutions to some of the world’s most innovative companies — from high-growth startups to Fortune 500 enterprises. With a people-first culture and a commitment to excellence, we combine nearshore talent, agile...
Site Reliability Engineer

hace 2 semanas

Colombia, Huila Datavail A tiempo completo

You will own reliability for core services across multiple clouds, drive automation, and mentor more junior engineers. You will partner with developer teams to embed resilience into feature delivery. **Responsibilities**: - Define and maintain SLIs/SLOs, monitor alignment and error budget usage - Lead incident response and postmortems, implement corrective...
Site Reliability Engineer

hace 6 días

Colombia, Huila Datavail A tiempo completo

**Technical Summary** You will support the reliability and scalability of services across AWS, Azure, GCP, and Oracle by executing automation, CI/CD, observability, and container orchestration tasks. You will work closely with senior engineers to ensure production systems are stable, well-monitored, and continuously improving. **Responsibilities** -...
Site Reliability Engineer

hace 2 semanas

Colombia, Huila Datavail A tiempo completo

At least 2 years of hands-on experience with AWS - We require at least one AWS associate level certification. - Able to contribute through CloudFormation / Terraform - Good knowledge of AWS core services related to Infrastructure (EC2, ECS, EKS, RDS, EBS etc.), Networking (VPC, Network Security Groups, Peering, Transit Gateway, site-to-site VPN etc.),...
Site Reliability Engineer

hace 3 minutos

Colombia Felix Technologies, Inc. A tiempo completo

About Us At Félix, we're building the financial ecosystem for Latin immigrants in the U.S., starting with a revolution in remittances. Our core product is an AI-powered chatbot built on WhatsApp, allowing our users to send money home as easily as sending a text message. We leverage cutting-edge technology like AI, blockchain, and stablecoins to make...
Site Reliability Engineer

hace 2 minutos

Colombia MAS Global Consulting A tiempo completo

Who We AreAt MAS Global Consulting, we bring together diverse engineering talent and meaningful work opportunities with global clients who value innovation, quality, and people-first collaboration. Our mission is to help organizations build scalable, modern, and resilient platforms while enabling our consultants to grow in their careers.We are proud to...
Site Reliability Engineer

hace 2 minutos

Departamento del Huila, Colombia datAvail A tiempo completo

You will own reliability for core services across multiple clouds, drive automation, and mentor more junior engineers. You will partner with developer teams to embed resilience into feature delivery.ResponsibilitiesDefine and maintain SLIs/SLOs, monitor alignment and error budget usageLead incident response and postmortems, implement corrective...
Site Reliability Engineer II-2

hace 1 minuto

Bogota, Colombia (Bldg ) Mastercard A tiempo completo

Our PurposeMastercard powers economies and empowers people in 200+ countries and territories worldwide. Together with our customers, we're helping build a sustainable economy where everyone can prosper. We support a wide range of digital payments choices, making transactions secure, simple, smart and accessible. Our technology and innovation, partnerships...

América

Europa

Asia / Oceanía

África

Site Reliability Engineer