Site Reliability Engineer
hace 2 semanas
Site Reliability Engineer - Sr Looking for a Senior SRE engineer to join a team that works on a distributed architecture, spanning physical machines and virtualizing on‑prem host/cloud computing. Engineer will provide support centralizing DevOps and help existing teams adopt best practices within our environment. Candidate will manage complex tasks that span multiple stack layers, including working with Cloud and on‑prem servers, and automating system management (Python automation and scripting) to interact with our infra more efficiently. The customer develops and deploys systematic financial strategies across a variety of asset classes and global markets. We seek to produce high‑quality predictive signals (alphas) through our proprietary research platform to employ financial strategies focused on exploiting market inefficiencies. Our teams work collaboratively to drive the production of alphas and financial strategies – the foundation of a sustainable, global investment platform. Design and deploy Solutions‑as‑a‑service using open‑source technologies to automate system management, scaling and monitoring. Develop tools that improve and streamline current processes such as deployment, monitoring and incident management in a distributed environment. Work closely with development and operations team to design software solutions that enhance service reliability. Set up, configure and maintain monitoring and alerting systems that provide real‑time visibility into our systems. Participate in on‑call rotations. Contribute to on‑going DevOps/agile transformation. Leverage container orchestration tools (Kubernetes). Use cloud infrastructure (AWS, GCP, Azure, etc.) and IaC tools (Helm, Ansible, Terraform) to ensure fast, safe and reliable deployments. Requirements: Deep expertise and hands‑on experience with Linux systems; strong focus on system optimization and troubleshooting. Strong OOP and Python knowledge with hands‑on experience in automation, scripting and system management. In‑depth knowledge of container orchestration technologies such as Kubernetes (K8S). Experience with other cluster management tools like Slurm is a plus. Hands‑on experience with IaC tools like Helm, Terraform and Ansible. Strong knowledge of containerization technologies (Docker, Podman) to ensure reliable and consistent deployments. Experience with CI/CD tools, especially GitLab (preferred), GitHub, or Git. Experience with monitoring and logging solutions such as Prometheus, Grafana, and the ELK stack. Understanding of relational databases, tuning, and management in distributed systems (PostgreSQL, DynamoDB, Cassandra, etc.). Familiarity with Agile development methodologies, focusing on continuous improvement and collaboration. Exposure to cloud technologies such as AWS or GCP is a strong plus. Team‑first attitude with excellent verbal and written communication skills in English, able to work collaboratively with peers across the organization. Technical interviews: CV review, 1.5‑hour technical interview with the customer team. Topics include systems concepts (Linux, K8s, networking, containers) and live coding. Ubicación: Remote #J-18808-Ljbffr
-
WorkFromHome, Colombia N-iX A tiempo completoA leading technology firm located in Bogotá, Colombia is seeking a Site Reliability Engineer to enhance the reliability and scalability of software production environments, especially in onboarding new microservices. Responsibilities include automating workflows, managing service reliability, and collaborating across teams. The ideal candidate has strong...
-
Remote Lead Site Reliability Engineer — Scale
hace 2 semanas
WorkFromHome, Colombia Masabi A tiempo completoA leading fintech company is seeking a Lead Site Reliability Engineer to enhance system reliability. This remote role in Colombia involves designing reliable systems, contributing to incident response, and mentoring teams. Candidates should have substantial SRE or DevOps experience, particularly in AWS and infrastructure automation. A supportive and...
-
Site Reliability Engineer
hace 2 días
WorkFromHome, Colombia N-iX A tiempo completoN-iX Bogota, D.C., Capital District, Colombia Overview Site Reliability Engineer (SRE) to help monitor, maintain, and scale software production environments, with a primary focus on onboarding new microservices. Work closely with development and platform teams to automate and program-managed onboarding lifecycle—from requirements and environment setup...
-
Site Reliability Engineer ID45689
hace 2 semanas
WorkFromHome, Colombia AgileEngine A tiempo completoJoin to apply for the Site Reliability Engineer ID45689 role at AgileEngine AgileEngine is an Inc. 5000 company that creates award‑winning software for Fortune 500 brands and trailblazing startups across 17+ industries. We rank among the leaders in areas like application development and AI/ML, and our people‑first culture has earned us multiple Best...
-
Lead Site Reliability Engineer
hace 2 semanas
WorkFromHome, Colombia Masabi A tiempo completoLead Site Reliability Engineer Introducing Masabi // At Masabi, we’re driving the fare payment revolution, powering the journeys of millions all over the world. We build fare collection platforms that allow riders to seamlessly buy and present tickets for public transport either on their mobile phones, from a ticket machine, or even by tapping their bank...
-
Site Reliability Engineer
hace 2 días
WorkFromHome, Colombia DCT A tiempo completoOverview DCT Bogota, D.C., Capital District, Colombia Site Reliability Engineer Responsibilities Service & Infrastructure Management: Oversee and manage core platform web services, including API and database servers to ensure optimal performance and health. System Monitoring & Emergency Response: Proactively monitor application and infrastructure health...
-
Senior Site Reliability Engineer — Cloud
hace 2 semanas
WorkFromHome, Colombia AgileEngine A tiempo completoA leading software development company in Colombia is seeking a Site Reliability Engineer to shape secure and scalable cloud-native systems. You will design resilient AWS infrastructure, lead CI/CD pipeline development, and mentor teams in DevSecOps practices. This role emphasizes innovation and collaboration with a focus on automation and observability....
-
Site Reliability Engineer — Remote, Kubernetes
hace 2 días
WorkFromHome, Colombia BairesDev A tiempo completoA leading technology solutions provider is seeking a Site Reliability Engineer to support and administrate cloud project infrastructure. The role involves ensuring service availability and implementing CI/CD pipelines for automation. Candidates should have over 2 years of experience as an Infrastructure Engineer, familiarity with Kubernetes, and proficiency...
-
Site Reliability Engineer
hace 3 días
WorkFromHome, Colombia Félix A tiempo completoOverview At Félix, we're building the financial ecosystem for Latin immigrants in the U.S., starting with a revolution in remittances. Our core product is an AI-powered chatbot built on WhatsApp, allowing our users to send money home as easily as sending a text message. We leverage cutting-edge technology like AI, blockchain, and stablecoins to make...
-
Lead Site Reliability Engineer
hace 6 días
WorkFromHome, Colombia EPAM Systems A tiempo completo3 days ago Be among the first 25 applicants EPAM is a leading global provider of digital platform engineering and development services. We are committed to having a positive impact on our customers, our employees, and our communities. We embrace a dynamic and inclusive culture. Here you will collaborate with multi-national teams, contribute to a myriad of...