Site Reliability Engineer

hace 6 días

WorkFromHome, Colombia Félix A tiempo completo

Join to apply for the Site Reliability Engineer role at Félix About Us At Félix , we're building the financial ecosystem for Latin immigrants in the U.S., starting with a revolution in remittances. Our core product is an AI-powered chatbot built on WhatsApp, allowing our users to send money home as easily as sending a text message. We leverage cutting‑edge technology like AI, blockchain, and stablecoins to make cross‑border payments faster, more affordable, and more accessible than ever before. We are a hyper‑growth Series B company, backed by over $100 million in funding from top‑tier global investors, including QED, Castle Island, Switch Ventures, HTwenty, Monashees, and General Catalyst Customer Value Fund. Félix was selected as an “Endeavour Entrepreneur” and was a recipient of the CrossTech Fintech Startups Award. We are a group of extremely talented and dedicated high‑performers, united by our shared obsession with a single goal: empowering our customers. We are all owners of Félix, driven by a bias for action and a true experimentation spirit to get shit done with urgency and focus. Joining Félix means you will be part of a team building a legacy, a company that will outlive us all. This is a rare opportunity to apply your skills to a deeply meaningful mission—serving a community that has been underserved for too long. We are a team that is fiercely loyal to each other, where radical transparency and constructive feedback are how we grow and push for excellence. We are bold, we care less about what others are doing, and more about creating sustainable value and a product that truly makes our users' lives better. We are building the future, today. About The Role We’re looking for a Site Reliability Engineer (SRE) to join our Engineering Operations team, reporting directly to Damian Finol, Head of EngOps. This is a new role focused on strengthening the reliability, scalability, and security of the infrastructure that powers our fintech platform. You’ll work closely with Engineering and SecOps to ensure our systems are highly available, observable, and cost‑efficient. The role blends software engineering, systems operations, and security practices, with a strong emphasis on automation, proactive monitoring, and continuous improvement. Responsibilities Manage and optimize our infrastructure on Google Cloud Platform (GCP) and Google Kubernetes Engine (GKE). Automate provisioning and configuration using Terraform, Helm, and scripting languages such as Go, Python, and Bash. Build, maintain, and improve monitoring and alerting systems using Prometheus, Grafana, and centralized logging tools (e.g., ELK or Loki). Participate in on‑call rotations, incident response, and post‑mortem analyses, ensuring rapid recovery and continuous learning from failures. Define and track SLOs/SLIs and error budgets to monitor service health and performance. Implement cloud security best practices to protect sensitive data and maintain the integrity of our systems. Collaborate across Engineering, Security, and Product teams to embed reliability and automation in every phase of development and deployment. Contribute to GKE cost optimization and resource management strategies to enhance efficiency and control operational spend. Requirements 4+ years of experience as an SRE, DevOps, Infrastructure, or Platform Engineer. Strong hands‑on experience with GCP and GKE. Proficiency in Kubernetes (architecture, deployments, networking, and troubleshooting). Solid programming or scripting skills in Go, Python, or Bash. Experience with Terraform and Helm for Infrastructure as Code. Strong understanding of monitoring and observability using Prometheus, Grafana, and logging frameworks. Familiarity with incident management, on‑call operations, and post‑mortem processes. Knowledge of network fundamentals (TCP/IP, DNS, load balancing). Experience with PostgreSQL or distributed databases. Awareness of FinOps and cloud cost management principles. Excellent problem‑solving, communication, and collaboration skills, with a proactive mindset. Certified Kubernetes Administrator (CKA). Experience in FinOps, cloud security, or regulated industries. Familiarity with PagerDuty or similar incident management tools. Background implementing SLOs/SLIs and error budgets in production environments. These are the applicable requisites, although equivalent competencies in any of the above will also be considered. What We Offer Competitive salary Initial stock options grant Annual performance bonus Health, dental, and vision plans Remote work environment, although we have offices in Miami and México City and would love to work in hybrid model if you are up to it. Continuous learning opportunities Unlimited PTO Paid parental leaveEmpowering opportunities for growth in a dynamic entrepreneurial environment Equal Opportunity Employer At Félix, we are committed to providing equal employment opportunities to all qualified employees and applicants without regard to race, religion, nationality, sex, sexual orientation, gender identity, age, or disability. This policy applies to all terms and conditions of employment, including recruitment, hiring, placement, promotion, training, compensation, benefits, and termination. Seniority level Mid‑Senior level Employment type Full‑time Job function Engineering and Information Technology #J-18808-Ljbffr

Senior Site Reliability Engineer — Remote

hace 7 horas

WorkFromHome, Colombia Truelogic Software LLC A tiempo completo

A leading software development firm based in Colombia is looking for a Site Reliability Engineer to enhance the reliability of their AWS and Kubernetes systems. The engineer will focus on observability, operational improvements, and collaborate with various engineering teams. This position offers 100% remote work and a highly competitive USD salary, along...
Site Reliability Engineer

hace 4 días

WorkFromHome, Colombia Epsilon Solutions Ltd. SA de CV. A tiempo completo

Sr. Site Reliability Engineer Location: Colombia (REMOTE)Employment type: Full Time Contract Key Skills Microsoft Technologies, IIS, Azure, AWS Kubernetes (K8), CI/CD Pipeline – Git Action, IaC – CloudFormation, Terraform Monitoring – Grafana, Troubleshooting in SRE (Preferred engineering background) Responsibilities 80% – Production support under...
Site Reliability Engineer: Microservices Onboarding

hace 4 días

WorkFromHome, Colombia N-iX A tiempo completo

A leading technology firm located in Bogotá, Colombia is seeking a Site Reliability Engineer to enhance the reliability and scalability of software production environments, especially in onboarding new microservices. Responsibilities include automating workflows, managing service reliability, and collaborating across teams. The ideal candidate has strong...
Site Reliability Engineer

hace 2 días

WorkFromHome, Colombia BairesDev A tiempo completo

Overview Site Reliability Engineer at BairesDev. We are looking for a Site Reliability Engineer to build and maintain highly reliable, scalable, and secure OpenShift/Kubernetes clusters. Approach production systems from a software engineering perspective with a focus on automation and reliability. What you will do Build and automate and maintain...
Site Reliability Engineer

hace 6 días

WorkFromHome, Colombia Blankfactor A tiempo completo

This is a remote position as a full time Colombia employee paid in COP. This requires a minimum of a B2 English comprehension, please be sure to apply with your English CV. We are seeking a proactive and experienced Site Reliability Engineer (SRE) to join our team, focusing on maximizing the reliability, availability, and performance of our enterprise...
Site Reliability Engineer

hace 4 días

WorkFromHome, Colombia N-iX A tiempo completo

N-iX Bogota, D.C., Capital District, Colombia Overview Site Reliability Engineer (SRE) to help monitor, maintain, and scale software production environments, with a primary focus on onboarding new microservices. Work closely with development and platform teams to automate and program-managed onboarding lifecycle—from requirements and environment setup...
Senior Site Reliability Engineer — Cloud

hace 4 días

WorkFromHome, Colombia AgileEngine A tiempo completo

A leading software development company in Colombia is seeking a Site Reliability Engineer to design and deploy scalable cloud-native systems. The ideal candidate has over 8 years of experience in SRE, is highly proficient in AWS and Terraform, and excels in CI/CD pipelines. The role involves mentoring teams, improving system reliability, and implementing...
Remote Site Reliability Engineer — SRE

hace 4 días

WorkFromHome, Colombia Epsilon Solutions Ltd. SA de CV. A tiempo completo

A leading technology solutions provider is seeking a Senior Site Reliability Engineer to provide production support and drive DevOps activities. This remote position focuses on troubleshooting issues in production and maintaining CI/CD pipelines while leveraging Microsoft technologies, AWS, and Kubernetes. Ideal candidates have strong skills in production...
Site Reliability Engineer

hace 2 días

WorkFromHome, Colombia EPAM Systems A tiempo completo

EPAM is a leading global provider of digital platform engineering and development services. We are committed to having a positive impact on our customers, our employees, and our communities. We embrace a dynamic and inclusive culture. Here you will collaborate with multi-national teams, contribute to a myriad of innovative projects that deliver the most...
Site Reliability Engineer

hace 8 horas

WorkFromHome, Colombia BairesDev A tiempo completo

Overview Site Reliability Engineer at BairesDev – Remote work We are looking for a Site Reliability Engineer to administer and provide support for the project infrastructure hosted in the cloud while implementing CI/CD pipelines for the automation of deployments. What You Will Do Ensure high service availability, performance, security, and maintainability....

América

Europa

Asia / Oceanía

África

Site Reliability Engineer