Site Reliability Engineer

hace 4 días

Colombia Felix Technologies, Inc. A tiempo completo

About Us
At Félix, we're building the financial ecosystem for Latin immigrants in the U.S., starting with a revolution in remittances. Our core product is an AI-powered chatbot built on WhatsApp, allowing our users to send money home as easily as sending a text message. We leverage cutting-edge technology like AI, blockchain, and stablecoins to make cross-border payments faster, more affordable, and more accessible than ever before.
We are a hyper-growth Series B company, backed by over $100 million in funding from top-tier global investors, including QED, Castle Island, Switch Ventures, HTwenty, Monashees, and General Catalyst Customer Value Fund. This isn't just about the numbers; it's a testament to the trust our investors have in our vision and our team. Additionally, Félix was selected as an "Endeavour Entrepreneur" and was a recipient of the CrossTech Fintech Startups Award. We are a group of extremely talented and dedicated high-performers, united by our shared obsession with a single goal: empowering our customers. We are all owners of Félix, driven by a bias for action and a true experimentation spirit to get shit done with urgency and focus.
Joining Félix means you will be part of a team building a legacy, a company that will outlive us all. This is a rare opportunity to apply your skills to a deeply meaningful mission—serving a community that has been underserved for too long. We are a team that is fiercely loyal to each other, where radical transparency and constructive feedback are how we grow and push for excellence. We are bold, we care less about what others are doing, and more about creating sustainable value and a product that truly makes our users' lives better. We are building the future, today.
About the Role
We're looking for a Site Reliability Engineer (SRE) to join our Engineering Operations team, reporting directly to Damian Finol, Head of EngOps. This is a new role focused on strengthening the reliability, scalability, and security of the infrastructure that powers our fintech platform. You'll work closely with Engineering and SecOps to ensure our systems are highly available, observable, and cost-efficient. The role blends software engineering, systems operations, and security practices, with a strong emphasis on automation, proactive monitoring, and continuous improvement.
Responsibilities
Manage and optimize our infrastructure on Google Cloud Platform (GCP) and Google Kubernetes Engine (GKE).Automate provisioning and configuration using Terraform, Helm, and scripting languages such as Go, Python, and Bash.Build, maintain, and improve monitoring and alerting systems using Prometheus, Grafana, and centralized logging tools (e.g., ELK or Loki).Participate in on-call rotations, incident response, and post-mortem analyses, ensuring rapid recovery and continuous learning from failures.Define and track SLOs/SLIs and error budgets to monitor service health and performance.Implement cloud security best practices to protect sensitive data and maintain the integrity of our systems.Collaborate across Engineering, Security, and Product teams to embed reliability and automation in every phase of development and deployment.Contribute to GKE cost optimization and resource management strategies to enhance efficiency and control operational spend.
Requirements
4+ years of experience as an SRE, DevOps, Infrastructure, or Platform Engineer.Strong hands-on experience with GCP and GKE.Proficiency in Kubernetes (architecture, deployments, networking, and troubleshooting).Solid programming or scripting skills in Go, Python, or Bash.Experience with Terraform and Helm for Infrastructure as Code.Strong understanding of monitoring and observability using Prometheus, Grafana, and logging frameworks.Familiarity with incident management, on-call operations, and post-mortem processes.Knowledge of network fundamentals (TCP/IP, DNS, load balancing).Experience with PostgreSQL or distributed databases.Awareness of FinOps and cloud cost management principles.Excellent problem-solving, communication, and collaboration skills, with a proactive mindset.Certified Kubernetes Administrator (CKA).Experience in FinOps, cloud security, or regulated industries.Familiarity with PagerDuty or similar incident management tools.Background implementing SLOs/SLIs and error budgets in production environments.These are the applicable requisites, although equivalent competencies in any of the above will also be considered.
What We Offer
Competitive salaryInitial stock options grantAnnual performance bonusHealth, dental, and vision plans Remote work environment, although we have offices in Miami and México City and would love to work in hybrid model if you are up to it.Continuous learning opportunities Unlimited PTOPaid parental leaveEmpowering opportunities for growth in a dynamic entrepreneurial environment
Equal Opportunity Employer
At Félix, we are committed to providing equal employment opportunities to all qualified employees and applicants without regard to race, religion, nationality, sex, sexual orientation, gender identity, age, or disability. This policy applies to all terms and conditions of employment, including recruitment, hiring, placement, promotion, training, compensation, benefits, and termination.
Want to learn more about our privacy practices? Check out our Privacy Policy.

Azure DevOps Engineer

hace 7 días

Colombia Axiom Path Inc A tiempo completo

**Azure DevOps Engineer / Site Reliability Engineer** **Contract, 100% REMOTE** - In this role, you will leverage your DevOps expertise to design, automate, and streamline the software development lifecycle while playing a crucial role in maintaining website uptime. This role requires a strong ability to handle emergencies, troubleshoot website outages, and...
Principal Site Reliability Engineer

hace 2 días

Colombia, Huila Groupon A tiempo completo

Groupon is a marketplace where customers discover new experiences and services everyday and local businesses thrive. To date we have worked with over a million merchant partners worldwide, connecting over 16 million customers with deals across various categories. In a world often dominated by e-commerce giants, we stand out as one of the few platforms...
Site Reliability Engineer

hace 2 días

Colombia MAS Global Consulting A tiempo completo

Who We AreAt MAS Global Consulting, we bring together diverse engineering talent and meaningful work opportunities with global clients who value innovation, quality, and people-first collaboration. Our mission is to help organizations build scalable, modern, and resilient platforms while enabling our consultants to grow in their careers.We are proud to...
Infrastructure Services Site Reliability Engineer

hace 1 semana

Colombia Kyndryl Colombia SAS A tiempo completo

**Why Kyndryl** Kyndryl is a market leader that thinks and acts like a start-up. We design, build, manage, and modernize the mission-critical technology systems that the world depends on every day. So why work at Kyndryl? We are always moving forward - always pushing ourselves to go further in our efforts to build a more equitable, inclusive world for our...
Site Reliability Engineer

hace 3 horas

Colombia, Huila Datavail A tiempo completo

At least 2 years of hands-on experience with AWS - We require at least one AWS associate level certification. - Able to contribute through CloudFormation / Terraform - Good knowledge of AWS core services related to Infrastructure (EC2, ECS, EKS, RDS, EBS etc.), Networking (VPC, Network Security Groups, Peering, Transit Gateway, site-to-site VPN etc.),...
Reliability Engineer

hace 2 semanas

Colombia Neostella A tiempo completo

**TO BE CONSIDERED, CANDIDATES MUST HAVE A TOURIST VISA TO TRAVEL TO THE UNITED STATES** **What you’ll do**: Our manufacturing client is looking for a Reliability Engineer. The Reliability Engineer is a hands-on technical role that is critical in leading all engineering efforts in collaboration with the operating teams & unit operations across multiple...
Reliability Engineer

hace 7 días

Colombia, Huila Baker Hughes A tiempo completo

Role Description **Reliability Engineer** **Summary** Can work with limited supervision on assigned tasks with standard techniques to build on basic knowledge and develop skills in specific practice areas. Interacts with clients and client organisations and has an understanding of how maintenance management is executed. Understands project management...
Senior Site Reliability Engineer

hace 2 días

Colombia, Huila Groupon A tiempo completo

Groupon is a marketplace where customers discover new experiences and services everyday and local businesses thrive. To date we have worked with over a million merchant partners worldwide, connecting over 16 million customers with deals across various categories. In a world often dominated by e-commerce giants, we stand out as one of the few platforms...
Senior Site Reliability Engineer

hace 1 semana

Colombia Yuxi Global A tiempo completo

Company Description Yuxi Global is an American company with high functional teams across Latin America. We stay updated with the most modern, edge practices and technologies. Our teams are versatile, adaptable and have expertise in a wide range of programming languages, databases and frameworks. This is your invitation to someone who loves working with the...
Site Reliability Engineer

hace 2 semanas

Colombia Rockwell Automation A tiempo completo

Rockwell Automation is a global technology leader focused on helping the world’s manufacturers be more productive, sustainable, and agile. With more than 25,000 employees who make the world better every day, we know we have something special. Behind our customers - amazing companies that help feed the world, provide life-saving medicine on a global scale,...

América

Europa

Asia / Oceanía

África

Site Reliability Engineer