Site Reliability Engineer
hace 6 días
You will own reliability for core services across multiple clouds, drive automation, and mentor more junior engineers. You will partner with developer teams to embed resilience into feature delivery.
**Responsibilities**:
- Define and maintain SLIs/SLOs, monitor alignment and error budget usage
- Lead incident response and postmortems, implement corrective measures
- Automate operations tasks via tooling (e.g. auto-remediation, scaling rules)
- Build, improve, and maintain CI/CD pipelines, canary deployments, blue/green strategies
- Lead technical discussions with customers to align on reliability, scalability, and performance requirements
- Drive continuous platform improvements across the service lifecycle, including architecture, monitoring, and operational processes
- Implement and extend observability systems (metrics, tracing, log aggregation)
- Optimize performance and cost by tuning cloud services, autoscaling, resource rightsizing
- Design, deploy, and operate containerized workloads using Docker and Kubernetes in production environments
- Collaborate with dev teams to integrate resilience patterns (circuit breakers, bulkheading)
- Participate in architecture discussions around high availability, disaster recovery
- Mentor mid and junior SREs; conduct reliability design reviews
Must-have Qualifications
- 5-8 years of experience in a reliability or operations role
- Cloud-agnostic certification: Terraform Associate, Certified Kubernetes Administrator (CKA), or SRE Foundation
- Cloud provider certification: Professional-level certification in AWS (Solutions Architect), Azure (Solutions Architect Expert), GCP (Professional Cloud Architect), or Oracle Cloud (Architect Professional)
- Solid coding skills (Python, Go, or equivalent)
- Experience with IaC, CI/CD pipelines, and monitoring/observability stacks (Prometheus, Grafana, OpenTelemetry, ELK)
- Comfortable with observability stacks (Prometheus, Grafana, OpenTelemetry, ELK, Jaeger)
- Experience working in distributed systems and production scale services
Nice-to-have Skills
- Exposure to multi-cloud data replication or cross-cloud networks
- Experience with chaos engineering or fault injection
-
Remote - Edge Site Reliability Engineer (Sre)
hace 6 días
Colombia, Huila GSB Solutions A tiempo completoImportant company requires; **Edge Site Reliability Engineer (SRE) - Remote in Colombia** **Main Activities / Responsibilities**: - Guarantee the general system uptime, focus on availability to comply with the defined SLA, SLO and SLI. right SLI and SLO and identifying significant projects that result in substantial cost savings or revenues. - Spend
-
Senior Site Reliability Engineer
hace 5 minutos
Colombia MAS Global Consulting A tiempo completoWho We AreAt MAS Global Consulting, we are a premium digital engineering partner delivering technology solutions to some of the world's most innovative companies — from high-growth startups to Fortune 500 enterprises.With a people-first culture and a commitment to excellence, we combine nearshore talent, agile delivery, and technical depth to build...
-
Senior Site Reliability Engineer
hace 2 semanas
Colombia Mas Global Consulting Llc A tiempo completoSenior Site Reliability Engineer (SRE) | LATAM At MAS Global Consulting , we are a premium digital engineering partner delivering technology solutions to some of the world’s most innovative companies — from high-growth startups to Fortune 500 enterprises. With a people-first culture and a commitment to excellence, we combine nearshore talent, agile...
-
Site Reliability Engineer
hace 2 semanas
Colombia, Huila Datavail A tiempo completoYou will own reliability for core services across multiple clouds, drive automation, and mentor more junior engineers. You will partner with developer teams to embed resilience into feature delivery. **Responsibilities**: - Define and maintain SLIs/SLOs, monitor alignment and error budget usage - Lead incident response and postmortems, implement corrective...
-
Site Reliability Engineer
hace 6 días
Colombia, Huila Datavail A tiempo completo**Technical Summary** You will support the reliability and scalability of services across AWS, Azure, GCP, and Oracle by executing automation, CI/CD, observability, and container orchestration tasks. You will work closely with senior engineers to ensure production systems are stable, well-monitored, and continuously improving. **Responsibilities** -...
-
Site Reliability Engineer
hace 2 semanas
Colombia, Huila Datavail A tiempo completoAt least 2 years of hands-on experience with AWS - We require at least one AWS associate level certification. - Able to contribute through CloudFormation / Terraform - Good knowledge of AWS core services related to Infrastructure (EC2, ECS, EKS, RDS, EBS etc.), Networking (VPC, Network Security Groups, Peering, Transit Gateway, site-to-site VPN etc.),...
-
Site Reliability Engineer
hace 3 minutos
Colombia Felix Technologies, Inc. A tiempo completoAbout Us At Félix, we're building the financial ecosystem for Latin immigrants in the U.S., starting with a revolution in remittances. Our core product is an AI-powered chatbot built on WhatsApp, allowing our users to send money home as easily as sending a text message. We leverage cutting-edge technology like AI, blockchain, and stablecoins to make...
-
Site Reliability Engineer
hace 2 minutos
Colombia MAS Global Consulting A tiempo completoWho We AreAt MAS Global Consulting, we bring together diverse engineering talent and meaningful work opportunities with global clients who value innovation, quality, and people-first collaboration. Our mission is to help organizations build scalable, modern, and resilient platforms while enabling our consultants to grow in their careers.We are proud to...
-
Site Reliability Engineer
hace 2 minutos
Departamento del Huila, Colombia datAvail A tiempo completoYou will own reliability for core services across multiple clouds, drive automation, and mentor more junior engineers. You will partner with developer teams to embed resilience into feature delivery.ResponsibilitiesDefine and maintain SLIs/SLOs, monitor alignment and error budget usageLead incident response and postmortems, implement corrective...
-
Site Reliability Engineer II-2
hace 1 minuto
Bogota, Colombia (Bldg ) Mastercard A tiempo completoOur PurposeMastercard powers economies and empowers people in 200+ countries and territories worldwide. Together with our customers, we're helping build a sustainable economy where everyone can prosper. We support a wide range of digital payments choices, making transactions secure, simple, smart and accessible. Our technology and innovation, partnerships...