Senior Site Reliability Engineer
hace 2 semanas
Senior Site Reliability Engineer – Canonical – Bogotá, D.C., Colombia Canonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. Our platform, Ubuntu, is widely used in breakthrough enterprise initiatives such as public cloud, data science, AI, engineering innovation and IoT. Our customers include the world’s leading public cloud and silicon providers, and industry leaders in many sectors. The company is a pioneer of global distributed collaboration, with 1,200+ colleagues in 75+ countries and very few office‑based roles. About the Role The Senior Site Reliability Engineer will lead next‑gen operations at scale, applying pure Python infra‑as‑code from bare metal to containers and applications. The goal is to perfect enterprise infrastructure DevOps, running hundreds of private cloud, Kubernetes and application clusters for customers across the physical and public cloud estate. The role requires a scientific approach to operations driven by metrics and code, and the ability to learn the entire stack, from bare metal networking and kernel to serverless and open source applications. Responsibilities Bring Python software‑engineering skills and rigour to the operations domain. Practice DevSecOps from bare metal to application, designing and running OpenStack, Kubernetes and software‑defined storage. Enable DevSecOps for applications running on the infrastructure. Operate mission‑critical services for global brand‑name customers in a high‑pressure environment. Stay current with the latest capabilities in open source infrastructure and drive upgrades to keep customers on the best solutions. Desired Experience and Skills Degree in Software Engineering or Computer Science. Experience with Linux and familiarity with Linux networking and storage. Python software development expertise. Operational experience in a large‑scale, high‑availability environment. Excellent interpersonal skills, curiosity, flexibility, and accountability. Ability to travel internationally twice a year for company events up to two weeks long. Nice‑to‑have Skills Experience with OpenStack or Kubernetes deployment or operations. Familiarity with public or private cloud management. Benefits Distributed work environment with twice‑yearly team sprints in person. Personal learning and development budget of USD 2,000 per year. Annual compensation review. Recognition rewards. Annual holiday leave. Maternity and paternity leave. Employee Assistance Programme. Opportunity to travel to new locations to meet colleagues. Priority Pass and travel upgrades for long‑haul company events. About Canonical Canonical is a pioneering tech firm at the forefront of the global move to open source. As the company that publishes Ubuntu, one of the most important open source projects and the platform for AI, IoT and the cloud, we are changing the world of software. We recruit on a global basis and set a very high standard for people joining the company. Most colleagues at Canonical have worked from home since its inception in 2004. Canonical is an Equal Opportunity Employer We are proud to foster a workplace free from discrimination. Diversity of experience, perspectives, and background create a better work environment and better products. Whatever your identity, we will give your application fair consideration. Employment Information Seniority Level: Mid‑Senior level Employment Type: Full‑time Job Function: Engineering and Information Technology Industries: Software Development #J-18808-Ljbffr
-
Senior Site Reliability Engineer — Remote
hace 2 semanas
WorkFromHome, Colombia Truelogic Software LLC A tiempo completoA leading software development firm based in Colombia is looking for a Site Reliability Engineer to enhance the reliability of their AWS and Kubernetes systems. The engineer will focus on observability, operational improvements, and collaborate with various engineering teams. This position offers 100% remote work and a highly competitive USD salary, along...
-
Senior Site Reliability Engineer — Cloud
hace 2 semanas
WorkFromHome, Colombia AgileEngine A tiempo completoA leading software development firm in Colombia is seeking an experienced Site Reliability Engineer (SRE) to enhance cloud-native systems' reliability and efficiency. You will work closely with cross-functional teams, focusing on resilient AWS infrastructure and DevSecOps practices. Candidates should possess 8–10 years of experience in infrastructure or...
-
Remote Lead Site Reliability Engineer — Scale
hace 7 días
WorkFromHome, Colombia Masabi A tiempo completoA leading fintech company is seeking a Lead Site Reliability Engineer to enhance system reliability. This remote role in Colombia involves designing reliable systems, contributing to incident response, and mentoring teams. Candidates should have substantial SRE or DevOps experience, particularly in AWS and infrastructure automation. A supportive and...
-
Site Reliability Engineer ID45689
hace 1 semana
WorkFromHome, Colombia AgileEngine A tiempo completoJoin to apply for the Site Reliability Engineer ID45689 role at AgileEngine AgileEngine is an Inc. 5000 company that creates award‑winning software for Fortune 500 brands and trailblazing startups across 17+ industries. We rank among the leaders in areas like application development and AI/ML, and our people‑first culture has earned us multiple Best...
-
Senior Site Reliability Engineer — Cloud
hace 1 semana
WorkFromHome, Colombia AgileEngine A tiempo completoA leading software development company in Colombia is seeking a Site Reliability Engineer to shape secure and scalable cloud-native systems. You will design resilient AWS infrastructure, lead CI/CD pipeline development, and mentor teams in DevSecOps practices. This role emphasizes innovation and collaboration with a focus on automation and observability....
-
Senior Engineering Manager, Site Reliability
hace 7 días
WorkFromHome, Colombia Next League A tiempo completoSenior Engineering Manager, Site Reliability Join to apply for the Senior Engineering Manager, Site Reliability role at Next League As the Senior Manager of Site Reliability Engineering, you will be responsible for ensuring the reliability, scalability, and efficiency for a wide range of client systems, including organizations such as NASCAR, USOPC, and TGL....
-
Senior Site Reliability Engineer — Remote
hace 2 semanas
WorkFromHome, Colombia Truelogic A tiempo completoA leading nearshore staff augmentation firm in Bogotá seeks a Site Reliability Engineer to enhance the reliability of distributed systems on AWS and Kubernetes. Responsibilities include designing observability strategies, monitoring system behavior, and automating operational responses. The ideal candidate has over 5 years of experience in SRE/Platform...
-
Lead Site Reliability Engineer
hace 7 días
WorkFromHome, Colombia Masabi A tiempo completoLead Site Reliability Engineer Introducing Masabi // At Masabi, we’re driving the fare payment revolution, powering the journeys of millions all over the world. We build fare collection platforms that allow riders to seamlessly buy and present tickets for public transport either on their mobile phones, from a ticket machine, or even by tapping their bank...
-
Senior Site Reliability Engineer
hace 1 semana
WorkFromHome, Colombia Truelogic Software A tiempo completoA technology solutions provider seeks a Senior Reliability Engineer to enhance the reliability of distributed systems on AWS and Kubernetes. This fully remote role emphasizes observability, automated scaling, and operational excellence. Ideal candidates should have over 5 years of relevant experience, strong skills in AWS services, and a background in...
-
Site Reliability Engineer
hace 2 semanas
WorkFromHome, Colombia BairesDev A tiempo completoOverview Site Reliability Engineer at BairesDev – Remote work We are looking for a Site Reliability Engineer to administer and provide support for the project infrastructure hosted in the cloud while implementing CI/CD pipelines for the automation of deployments. What You Will Do Ensure high service availability, performance, security, and maintainability....