Site Reliability Engineer

hace 2 semanas


WorkFromHome, Colombia Patagonian A tiempo completo

Site Reliability Engineer - Sr Looking for a Senior SRE engineer to join a team that works on a distributed architecture, spanning physical machines and virtualizing on‑prem host/cloud computing. Engineer will provide support centralizing DevOps and help existing teams adopt best practices within our environment. Candidate will manage complex tasks that span multiple stack layers, including working with Cloud and on‑prem servers, and automating system management (Python automation and scripting) to interact with our infra more efficiently. The customer develops and deploys systematic financial strategies across a variety of asset classes and global markets. We seek to produce high‑quality predictive signals (alphas) through our proprietary research platform to employ financial strategies focused on exploiting market inefficiencies. Our teams work collaboratively to drive the production of alphas and financial strategies – the foundation of a sustainable, global investment platform. Design and deploy Solutions‑as‑a‑service using open‑source technologies to automate system management, scaling and monitoring. Develop tools that improve and streamline current processes such as deployment, monitoring and incident management in a distributed environment. Work closely with development and operations team to design software solutions that enhance service reliability. Set up, configure and maintain monitoring and alerting systems that provide real‑time visibility into our systems. Participate in on‑call rotations. Contribute to on‑going DevOps/agile transformation. Leverage container orchestration tools (Kubernetes). Use cloud infrastructure (AWS, GCP, Azure, etc.) and IaC tools (Helm, Ansible, Terraform) to ensure fast, safe and reliable deployments. Requirements: Deep expertise and hands‑on experience with Linux systems; strong focus on system optimization and troubleshooting. Strong OOP and Python knowledge with hands‑on experience in automation, scripting and system management. In‑depth knowledge of container orchestration technologies such as Kubernetes (K8S). Experience with other cluster management tools like Slurm is a plus. Hands‑on experience with IaC tools like Helm, Terraform and Ansible. Strong knowledge of containerization technologies (Docker, Podman) to ensure reliable and consistent deployments. Experience with CI/CD tools, especially GitLab (preferred), GitHub, or Git. Experience with monitoring and logging solutions such as Prometheus, Grafana, and the ELK stack. Understanding of relational databases, tuning, and management in distributed systems (PostgreSQL, DynamoDB, Cassandra, etc.). Familiarity with Agile development methodologies, focusing on continuous improvement and collaboration. Exposure to cloud technologies such as AWS or GCP is a strong plus. Team‑first attitude with excellent verbal and written communication skills in English, able to work collaboratively with peers across the organization. Technical interviews: CV review, 1.5‑hour technical interview with the customer team. Topics include systems concepts (Linux, K8s, networking, containers) and live coding. Ubicación: Remote #J-18808-Ljbffr



  • WorkFromHome, Colombia Epsilon Solutions Ltd. SA de CV. A tiempo completo

    Sr. Site Reliability Engineer Location: Colombia (REMOTE)Employment type: Full Time Contract Key Skills Microsoft Technologies, IIS, Azure, AWS Kubernetes (K8), CI/CD Pipeline – Git Action, IaC – CloudFormation, Terraform Monitoring – Grafana, Troubleshooting in SRE (Preferred engineering background) Responsibilities 80% – Production support under...


  • WorkFromHome, Colombia N-iX A tiempo completo

    A leading technology firm located in Bogotá, Colombia is seeking a Site Reliability Engineer to enhance the reliability and scalability of software production environments, especially in onboarding new microservices. Responsibilities include automating workflows, managing service reliability, and collaborating across teams. The ideal candidate has strong...


  • WorkFromHome, Colombia Masabi A tiempo completo

    A leading fintech company is seeking a Lead Site Reliability Engineer to enhance system reliability. This remote role in Colombia involves designing reliable systems, contributing to incident response, and mentoring teams. Candidates should have substantial SRE or DevOps experience, particularly in AWS and infrastructure automation. A supportive and...


  • WorkFromHome, Colombia Blankfactor A tiempo completo

    This is a remote position as a full time Colombia employee paid in COP. This requires a minimum of a B2 English comprehension, please be sure to apply with your English CV. We are seeking a proactive and experienced Site Reliability Engineer (SRE) to join our team, focusing on maximizing the reliability, availability, and performance of our enterprise...


  • WorkFromHome, Colombia N-iX A tiempo completo

    N-iX Bogota, D.C., Capital District, Colombia Overview Site Reliability Engineer (SRE) to help monitor, maintain, and scale software production environments, with a primary focus on onboarding new microservices. Work closely with development and platform teams to automate and program-managed onboarding lifecycle—from requirements and environment setup...


  • WorkFromHome, Colombia Truelogic A tiempo completo

    A leading technology firm in Colombia seeks a Site Reliability Engineer to enhance the reliability of systems on AWS and Kubernetes. The role emphasizes observability and automated responses to system behavior. Candidates should have over five years of experience in SRE roles and expertise in AWS and Kubernetes. This position offers fully remote work,...


  • WorkFromHome, Colombia Epsilon Solutions Ltd. SA de CV. A tiempo completo

    A leading technology solutions provider is seeking a Senior Site Reliability Engineer to provide production support and drive DevOps activities. This remote position focuses on troubleshooting issues in production and maintaining CI/CD pipelines while leveraging Microsoft technologies, AWS, and Kubernetes. Ideal candidates have strong skills in production...


  • WorkFromHome, Colombia AgileEngine A tiempo completo

    A leading software development company in Colombia is seeking a Site Reliability Engineer to design and deploy scalable cloud-native systems. The ideal candidate has over 8 years of experience in SRE, is highly proficient in AWS and Terraform, and excels in CI/CD pipelines. The role involves mentoring teams, improving system reliability, and implementing...


  • WorkFromHome, Colombia AgileEngine A tiempo completo

    Join to apply for the Site Reliability Engineer ID45689 role at AgileEngine AgileEngine is an Inc. 5000 company that creates award‑winning software for Fortune 500 brands and trailblazing startups across 17+ industries. We rank among the leaders in areas like application development and AI/ML, and our people‑first culture has earned us multiple Best...


  • WorkFromHome, Colombia Masabi A tiempo completo

    Lead Site Reliability Engineer Introducing Masabi // At Masabi, we’re driving the fare payment revolution, powering the journeys of millions all over the world. We build fare collection platforms that allow riders to seamlessly buy and present tickets for public transport either on their mobile phones, from a ticket machine, or even by tapping their bank...