Site Reliability Engineer
hace 6 días
Site Reliability Engineer (AWS) - Technology Join to apply for the Site Reliability Engineer (AWS) - Technology role at Truelogic Software About Truelogic At Truelogic we are a leading provider of nearshore staff augmentation services headquartered in New York. For over two decades, we’ve been delivering top-tier technology solutions to companies of all sizes, from innovative startups to industry leaders, helping them achieve their digital transformation goals. Our team of 600+ highly skilled tech professionals, based in Latin America, drives digital disruption by partnering with U.S. companies on their most impactful projects. Our Client A data‑driven technology company that partners with high‑growth brands to optimize customer acquisition and retention. It specializes in delivering high‑LTV audiences and enrichment data to increase repeat purchase rates. The company collaborates with major platforms and agencies such as Shopify, Experian, TransUnion, and top media partners, all focused on driving profitable revenue growth. Job Summary The Site Reliability Engineer plays a key role in platform enablement by building and maintaining core infrastructure tooling that enables teams to deploy and operate services reliably using AWS and Kubernetes. This position focuses on managing and evolving internal Infrastructure as Code (IaC) constructs, primarily Python‑based abstractions built with AWS CDK and CDK8s. These constructs encompass networking, EKS configuration, data stores, observability, autoscaling patterns, and deployment primitives. The engineer collaborates closely with backend teams to ensure infrastructure is secure, consistent, and easy to integrate, driving platform reliability and developer productivity. Responsibilities Design, implement, and evolve shared AWS CDK and CDK8s constructs used across multiple services and teams. Maintain core infrastructure components including VPC, EKS clusters and node groups, RDS, OpenSearch, and MSK. Operate and extend Kubernetes cluster addons such as ingress controllers, cert‑manager, autoscalers, and monitoring/logging stacks. Ensure high reliability through structured alerting systems (Prometheus, CloudWatch), autoscaling strategies, and recovery mechanisms. Manage and publish baseline templates, configuration schemas, and comprehensive documentation for infrastructure usage. Own the CI/CD pipelines for Infrastructure as Code (IaC) codebases and platform component releases. Collaborate with engineering teams to troubleshoot infrastructure‑related issues and deliver scalable, reliable solutions. Apply Site Reliability Engineering (SRE) principles—including SLIs, SLOs, observability, and fault tolerance—to all shared platform services. Support IAM roles, secrets management, and tenant isolation best practices. Qualifications And Job Requirements Has 5+ years of experience in infrastructure or Site Reliability Engineering (SRE), including hands‑on work with AWS services such as VPC, IAM, RDS, MSK, and S3, as well as Kubernetes components like Helm, RBAC, and ServiceAccounts. Demonstrates fluency in Python and has practical experience with Infrastructure‑as‑Code using AWS CDK, CDK8s, or equivalent frameworks such as Pulumi. Possesses a strong understanding of Prometheus, Grafana, and effective alert routing practices. Has experience designing reusable infrastructure patterns or building internal developer platforms. Shows a proven track record of improving system reliability through automation, monitoring, and operational best practices. Has experience supporting Spark on Kubernetes, Argo, or Kafka‑based batch pipelines. What we offer 100% Remote Work: Enjoy the freedom to work from the location that helps you thrive. All it takes is a laptop and a reliable internet connection. Highly Competitive USD Pay: Earn an excellent, market‑leading compensation in USD, that goes beyond typical market offerings. Paid Time Off: We value your well‑being. Our paid time off policies ensure you have the chance to unwind and recharge when needed. Work with Autonomy: Enjoy the freedom to manage your time as long as the work gets done. Focus on results, not the clock. Work with Top American Companies: Grow your expertise working on innovative, high‑impact projects with Industry‑Leading U.S. Companies. Why You’ll Like Working Here A Culture That Values You: We prioritize well‑being and work‑life balance, offering engagement activities and fostering dynamic teams to ensure you thrive both personally and professionally. Diverse, Global Network: Connect with over 600 professionals in 25+ countries, expand your network, and collaborate with a multicultural team from Latin America. Team Up with Skilled Professionals: Join forces with senior talent. All of our team members are seasoned experts, ensuring you’re working with the best in your field. Seniority level Mid‑Senior level Employment type Full‑time Job function Engineering and Information Technology Location Bogota, D.C., Capital District, Colombia #J-18808-Ljbffr
-
Remote Lead Site Reliability Engineer — Scale
hace 6 días
WorkFromHome, Colombia Masabi A tiempo completoA leading fintech company is seeking a Lead Site Reliability Engineer to enhance system reliability. This remote role in Colombia involves designing reliable systems, contributing to incident response, and mentoring teams. Candidates should have substantial SRE or DevOps experience, particularly in AWS and infrastructure automation. A supportive and...
-
Senior Site Reliability Engineer — Remote
hace 1 semana
WorkFromHome, Colombia Truelogic A tiempo completoA leading technology firm in Colombia seeks a Site Reliability Engineer to enhance the reliability of systems on AWS and Kubernetes. The role emphasizes observability and automated responses to system behavior. Candidates should have over five years of experience in SRE roles and expertise in AWS and Kubernetes. This position offers fully remote work,...
-
Senior Site Reliability Engineer | Automation
hace 2 semanas
WorkFromHome, Colombia NiCE A tiempo completoA global technology company is seeking a Senior Site Reliability Engineer in Medellín to enhance the reliability and scalability of its platform. This hybrid role offers ownership of critical systems and opportunities for professional growth with comprehensive company benefits. The ideal candidate has extensive experience in Linux and cloud infrastructure,...
-
Site Reliability Engineer ID45689
hace 1 semana
WorkFromHome, Colombia AgileEngine A tiempo completoJoin to apply for the Site Reliability Engineer ID45689 role at AgileEngine AgileEngine is an Inc. 5000 company that creates award‑winning software for Fortune 500 brands and trailblazing startups across 17+ industries. We rank among the leaders in areas like application development and AI/ML, and our people‑first culture has earned us multiple Best...
-
Lead Site Reliability Engineer
hace 6 días
WorkFromHome, Colombia Masabi A tiempo completoLead Site Reliability Engineer Introducing Masabi // At Masabi, we’re driving the fare payment revolution, powering the journeys of millions all over the world. We build fare collection platforms that allow riders to seamlessly buy and present tickets for public transport either on their mobile phones, from a ticket machine, or even by tapping their bank...
-
Site Reliability Engineer
hace 2 semanas
WorkFromHome, Colombia BairesDev A tiempo completoOverview Site Reliability Engineer at BairesDev – Remote work We are looking for a Site Reliability Engineer to administer and provide support for the project infrastructure hosted in the cloud while implementing CI/CD pipelines for the automation of deployments. What You Will Do Ensure high service availability, performance, security, and maintainability....
-
Site Reliability Engineer ID45689
hace 2 semanas
WorkFromHome, Colombia AgileEngine A tiempo completoJoin to apply for the Site Reliability Engineer ID45689 role at AgileEngine AgileEngine is an Inc. 5000 company that creates award-winning software for Fortune 500 brands and trailblazing startups across 17+ industries. We rank among the leaders in areas like application development and AI/ML, and our people-first culture has earned us multiple Best Place to...
-
Site Reliability Engineer ID45689
hace 2 semanas
WorkFromHome, Colombia AgileEngine A tiempo completoSite Reliability Engineer (ID45689) – AgileEngine Why Join Us AgileEngine is an Inc. 5000 company that creates award‑winning software for Fortune 500 brands and trailblazing startups across 17+ industries. We rank among the leaders in application development and AI/ML and have earned multiple Best Place to Work awards. If you're looking for a place to...
-
Site Reliability Engineer
hace 6 días
WorkFromHome, Colombia Patagonian A tiempo completoSite Reliability Engineer - Sr Looking for a Senior SRE engineer to join a team that works on a distributed architecture, spanning physical machines and virtualizing on‑prem host/cloud computing. Engineer will provide support centralizing DevOps and help existing teams adopt best practices within our environment. Candidate will manage complex tasks that...
-
Site Reliability Engineer
hace 1 semana
WorkFromHome, Colombia Truelogic Software LLC A tiempo completoAbout Truelogic At Truelogic we are a leading provider of nearshore staff augmentation services headquartered in New York. For over two decades, we’ve been delivering top-tier technology solutions to companies of all sizes, from innovative startups to industry leaders, helping them achieve their digital transformation goals. Our team of 600+ highly skilled...