Lead Site Reliability Engineer
hace 2 días
3 days ago Be among the first 25 applicants EPAM is a leading global provider of digital platform engineering and development services. We are committed to having a positive impact on our customers, our employees, and our communities. We embrace a dynamic and inclusive culture. Here you will collaborate with multi-national teams, contribute to a myriad of innovative projects that deliver the most creative and cutting-edge solutions, and have an opportunity to continuously learn and grow. No matter where you are located, you will join a dedicated, creative, and diverse community that will help you discover your fullest potential. Become a key member of our Enterprise Technology group as a Lead Site Reliability Engineer focused on maintaining and advancing critical infrastructure and applications. You will leverage your expertise in DevOps, cloud environments, and automation to design resilient and scalable systems. If you thrive on enhancing infrastructure and promoting continuous delivery practices, we invite you to join our team. Responsibilities Maintain and enhance enterprise application infrastructure using DevOps principles Design and oversee CI/CD pipelines to enable fast and dependable software deployments Administer and tune Kubernetes clusters for optimal scalability and security Create automation tools and scripts primarily in Python Manage cloud infrastructure across Amazon Web Services and Microsoft Azure focusing on security and identity management Partner with development teams to improve infrastructure as code via Terraform Monitor system metrics to proactively maintain high availability Coordinate operational requests and maintenance activities efficiently Diagnose and resolve complex infrastructure and deployment challenges Ensure adherence to security policies and industry best practices Document infrastructure setups and operational standards Contribute to disaster recovery and business continuity strategies Evaluate and integrate new technologies to boost system reliability and efficiency Requirements 5 more years of experience in Site Reliability Engineering or equivalent DevOps roles Advanced proficiency in Python programming Extensive expertise with AWS and Azure including APIs, authentication, and serverless services Comprehensive knowledge of cloud networking, Kubernetes administration, security, IAM, and configuration automation Deep understanding of CI/CD workflows, version control, containerization, and Terraform-based infrastructure management Hands-on experience enabling and enhancing IaaS environments Proven success in enterprise-scale software development and release processes Solid grasp of automation concepts related to CI/CD and infrastructure management Strong analytical and complex problem-solving abilities Capability to handle operational requests and maintenance tasks effectively Excellent communication skills with English proficiency at B2+ level We offer International projects with top brands Work with global teams of highly skilled, diverse peers Employee financial programs Paid time off and sick leave Upskilling, reskilling and certification courses Unlimited access to the LinkedIn Learning library and 22,000+ courses Global career opportunities Volunteer and community involvement opportunities EPAM Employee Groups Award-winning culture recognized by Glassdoor, Newsweek and LinkedIn Seniority level Mid-Senior level Employment type Full-time Job function Engineering, Information Technology, and Business Development Industries Software Development, IT Services and IT Consulting, and Investment Management #J-18808-Ljbffr
-
Remote Lead Site Reliability Engineer — Scale
hace 7 días
WorkFromHome, Colombia Masabi A tiempo completoA leading fintech company is seeking a Lead Site Reliability Engineer to enhance system reliability. This remote role in Colombia involves designing reliable systems, contributing to incident response, and mentoring teams. Candidates should have substantial SRE or DevOps experience, particularly in AWS and infrastructure automation. A supportive and...
-
Lead Site Reliability Engineer
hace 7 días
WorkFromHome, Colombia Masabi A tiempo completoLead Site Reliability Engineer Introducing Masabi // At Masabi, we’re driving the fare payment revolution, powering the journeys of millions all over the world. We build fare collection platforms that allow riders to seamlessly buy and present tickets for public transport either on their mobile phones, from a ticket machine, or even by tapping their bank...
-
Senior Site Reliability Engineer — Remote
hace 2 semanas
WorkFromHome, Colombia Truelogic Software LLC A tiempo completoA leading software development firm based in Colombia is looking for a Site Reliability Engineer to enhance the reliability of their AWS and Kubernetes systems. The engineer will focus on observability, operational improvements, and collaborate with various engineering teams. This position offers 100% remote work and a highly competitive USD salary, along...
-
Senior Site Reliability Engineer — Cloud
hace 1 semana
WorkFromHome, Colombia AgileEngine A tiempo completoA leading software development company in Colombia is seeking a Site Reliability Engineer to shape secure and scalable cloud-native systems. You will design resilient AWS infrastructure, lead CI/CD pipeline development, and mentor teams in DevSecOps practices. This role emphasizes innovation and collaboration with a focus on automation and observability....
-
Site Reliability Engineer ID45689
hace 1 semana
WorkFromHome, Colombia AgileEngine A tiempo completoJoin to apply for the Site Reliability Engineer ID45689 role at AgileEngine AgileEngine is an Inc. 5000 company that creates award‑winning software for Fortune 500 brands and trailblazing startups across 17+ industries. We rank among the leaders in areas like application development and AI/ML, and our people‑first culture has earned us multiple Best...
-
Senior Site Reliability Engineer — Cloud
hace 2 semanas
WorkFromHome, Colombia AgileEngine A tiempo completoA leading software development firm in Colombia is seeking an experienced Site Reliability Engineer (SRE) to enhance cloud-native systems' reliability and efficiency. You will work closely with cross-functional teams, focusing on resilient AWS infrastructure and DevSecOps practices. Candidates should possess 8–10 years of experience in infrastructure or...
-
Site Reliability Engineer
hace 2 semanas
WorkFromHome, Colombia BairesDev A tiempo completoOverview Site Reliability Engineer at BairesDev – Remote work We are looking for a Site Reliability Engineer to administer and provide support for the project infrastructure hosted in the cloud while implementing CI/CD pipelines for the automation of deployments. What You Will Do Ensure high service availability, performance, security, and maintainability....
-
Senior Site Reliability Engineer — Remote
hace 2 semanas
WorkFromHome, Colombia Truelogic A tiempo completoA leading nearshore staff augmentation firm in Bogotá seeks a Site Reliability Engineer to enhance the reliability of distributed systems on AWS and Kubernetes. Responsibilities include designing observability strategies, monitoring system behavior, and automating operational responses. The ideal candidate has over 5 years of experience in SRE/Platform...
-
Site Reliability Engineer
hace 2 semanas
WorkFromHome, Colombia Canonical A tiempo completoSite Reliability Engineer Canonical is a leading provider of open‑source software and operating systems to the global enterprise and technology markets. Our platform, Ubuntu, is widely used in breakthrough enterprise initiatives such as public cloud, data science, AI, engineering innovation, and IoT. With customers that include the world's leading public...
-
Remote Site Reliability Engineer — CI/CD
hace 2 semanas
WorkFromHome, Colombia BairesDev A tiempo completoA technology solutions company is seeking a Site Reliability Engineer to manage cloud infrastructure and automate deployments. Responsibilities include ensuring service availability, implementing CI/CD pipelines, and troubleshooting issues. Ideal candidates have experience with Kubernetes, Ansible, and CI/CD tools, along with advanced English skills. Enjoy a...