Lead Site Reliability Engineer
hace 2 semanas
Introducing Masabi
// At Masabi, we're driving the fare payment revolution, powering the journeys of millions all over the world. We build fare collection platforms that allow riders to seamlessly buy and present tickets for public transport either on their mobile phones, from a ticket machine, or even by tapping their bank card to travel.
Our Justride platform is used in over 250 locations globally, including some of the largest cities in the world. With our industry-first mobile ticketing SDK, we've partnered with large players in the transport space, including Uber, Moovit and Transit.
Your own journey is important to us too. Choosing a role here means joining a network of innovators from all walks of life; a group of passionate individuals who consistently deliver. Here, you'll find the tools you need to build the career you want. Whether you're taking the direct route or trying a new path, we'll support you no matter what.
The Role_
// We're looking for a Lead Site Reliability Engineer to join our platform team, someone who's confident working hands-on with infrastructure, but also ready to shape how we scale and operate as a global team.
Location_
This role is available in a remote model to candidates based in Colombia.
What You'll Be Doing_
Build and automate reliable systems
Lead design discussions and make key architectural decisions for reliability, scalability, and performance.
Establish SRE standards and best practices (IaC patterns, CI/CD maturity, observability, etc.) across teams.
Design and manage infrastructure using Terraform and CloudFormation
Build and evolve CI/CD pipelines that support fast, safe, and frequent deployments
Automate manual tasks to reduce operational load and enable faster delivery
Help expand our infrastructure globally, scaling up new environments with care
Improve visibility, scale and performance
Define and maintain SLIs, SLOs, and alerting strategies aligned with user experience
Implement monitoring solutions that give us clear, early signals during incidents
Lead capacity planning and performance tuning as our systems and teams grow
Identify opportunities to improve architecture for resilience and cost-effectiveness
Own reliability and incident response
Lead or contribute to incident response, root cause analysis, and post-incident reviews
Design and maintain disaster recovery and failover strategies
Partner with compliance and security teams to meet frameworks like SOC 2 and PCI
Support others and share your knowledge
Collaborate with engineers, architects, and product teams to embed SRE practices from the start and define long-term platform reliability strategy
Mentor others in areas like observability, incident readiness, and infrastructure-as-code
Document systems and processes clearly to support learning and long-term success
Partake of the on-call rotation, shared with the team and paid on top of salary
About You_
// You're an experienced SRE who combines technical depth with curiosity, care, and a desire to make things better for the platform, the team, and the people using our systems.
You've worked in SRE, platform, or DevOps roles where reliability was business-critical (24/7)
You have proven experience designing and evolving production-grade systems for scale and resilience.
You're comfortable designing and operating in AWS, with strong knowledge of cloud architecture, networking and security (VPC design, IAM, least privilege)
You have hands-on experience with Terraform, infrastructure automation, and CI/CD systems
You've led or contributed to high-impact projects involving observability, performance, incident command and/or reliability (distributed tracing, log correlation, metrics maturity, etc)
You communicate clearly and drive cross-functional reliability improvements in distributed, async-first teams
You enjoy helping others grow and value a kind, collaborative engineering culture
You take pride in doing things the right way, but you're pragmatic and focused on impact
Nice To Have_
Familiarity with PCI DSS v4 or similar compliance standards
Experience with container orchestration
AWS certifications
Our Tech Stack_
// Our platform is JVM-based and cloud-native, running on AWS. The SRE team works across both modern infrastructure and legacy systems as we continue to scale globally.
We use a range of proven tools to support performance, reliability, and speed of delivery:
Monitoring & Observability: Grafana, Prometheus, CloudWatch, Pingdom, Kibana
Infrastructure as Code: Terraform, CloudFormation
CI/CD & Automation: GitLab CI, Rundeck
Configuration Management & Logging: Puppet, Confluent Cloud
Careers at Masabi are for people going places - driven by a mission to make transit fair and accessible for all.
We are a network of innovators from all walks of life, passionate about making a difference. At Masabi, we operate with openness and trust, creating an environment where everyone feels empowered to bring their whole, authentic selves to work.
Whoever you are, just be yourself.
We welcome applications from underrepresented backgrounds and encourage you to share your pronouns at any stage. Together, we simplify journeys, remove barriers, and improve daily life for millions.
Why Join Masabi?
Driven by Purpose – We believe in journeys made simple. The work isn't always easy, but the best things never are.
Encouraged to Accelerate – Masabi is going places and our people are in the driving seat. Whether you're taking the direct route or exploring new paths, we support your journey.
Advancing with Empathy – We put people first and foster a culture of learning, not blame. No matter your cargo, we share the load.
We're already powering journeys - are you ready to join us?
-
Site Reliability Engineer
hace 3 días
Bogotá, Bogotá D.E., Colombia CBL Solutions A tiempo completoRole: Site Reliability EngineerLocation: Medellin or Bogota, ColombiaContract PositionRequirements:8 years of relevant experienceB1 English speakerSkills & Experience:8 years of relevant experienceExpert-level knowledge of distributed systems and cloud infrastructure.Extensive experience with automation and orchestration tools.Deep understanding of...
-
Lead Site Reliability Engineer
hace 3 días
Bogotá, Bogotá D.E., Colombia Exari Systems A tiempo completoApply for this JobCoupa makes margins multiply through its community-generated AI and industry-leading total spend management platform for businesses large and small. Coupa AI is informed by trillions of dollars of direct and indirect spend data across a global network of 10M+ buyers and suppliers. We empower you with the ability to predict, prescribe, and...
-
Lead Site Reliability Engineer
hace 2 semanas
Bogotá, Bogotá D.E., Colombia Coupa Software A tiempo completoCoupa makes margins multiply through its community-generated AI and industry-leading total spend management platform for businesses large and small. Coupa AI is informed by trillions of dollars of direct and indirect spend data across a global network of 10M+ buyers and suppliers. We empower you with the ability to predict, prescribe, and automate smarter,...
-
Lead Site Reliability Engineer
hace 3 días
Bogotá, Bogotá D.E., Colombia Coupa Software, Inc. A tiempo completoCoupa makes margins multiply through its community-generated AI and industry-leading total spend management platform for businesses large and small. Coupa AI is informed by trillions of dollars of direct and indirect spend data across a global network of 10M+ buyers and suppliers. We empower you with the ability to predict, prescribe, and automate smarter,...
-
Lead Site Reliability Engineer
hace 3 días
Bogotá, Bogotá D.E., Colombia Coupa A tiempo completoCoupa makes margins multiply through its community-generated AI and industry-leading total spend management platform for businesses large and small. Coupa AI is informed by trillions of dollars of direct and indirect spend data across a global network of 10M+ buyers and suppliers. We empower you with the ability to predict, prescribe, and automate smarter,...
-
Site Reliability Engineer
hace 1 semana
Bogotá, Bogotá D.E., Colombia Sur A tiempo completoAs the Site Reliability Engineer you will support and scale the infrastructure powering their secure, mission-critical SaaS platform.You must be confident in operating and debugging both modern infrastructure (cloud-native, containerized services) and classic Windows production environments (IIS, SQL Server AlwaysOn, Service Broker), with the ability to...
-
Site Reliability Engineer
hace 1 semana
Bogotá, Bogotá D.E., Colombia Sur A tiempo completoAs the Site Reliability Engineer you will support and scale the infrastructure powering their secure, mission-critical SaaS platform. You must be confident in operating and debugging both modern infrastructure (cloud-native, containerized services) and classic Windows production environments (IIS, SQL Server AlwaysOn, Service Broker), with the ability to...
-
Site Reliability Engineer II-1
hace 5 días
Bogotá, Bogotá D.E., Colombia Mastercard A tiempo completoOur PurposeMastercard powers economies and empowers people in 200+ countries and territories worldwide. Together with our customers, we're helping build a sustainable economy where everyone can prosper. We support a wide range of digital payments choices, making transactions secure, simple, smart and accessible. Our technology and innovation, partnerships...
-
Site Reliability Engineer II-1
hace 5 días
Bogotá, Bogotá D.E., Colombia Mastercard A tiempo completoOur PurposeMastercard powers economies and empowers people in 200+ countries and territories worldwide. Together with our customers, we're helping build a sustainable economy where everyone can prosper. We support a wide range of digital payments choices, making transactions secure, simple, smart and accessible. Our technology and innovation, partnerships...
-
Site Reliability Engineer ID45689
hace 2 semanas
Bogotá, Bogotá D.E., Colombia AgileEngine A tiempo completoAgileEngine is an Inc. 5000 company that creates award-winning software for Fortune 500 brands and trailblazing startups across 17+ industries. We rank among the leaders in areas like application development and AI/ML, and our people-first culture has earned us multiple Best Place to Work awards. WHY JOIN US If you're looking for a place to grow, make an...