Lead Site Reliability Engineer

hace 22 horas


WorkFromHome, Colombia EPAM Systems A tiempo completo

3 days ago Be among the first 25 applicants EPAM is a leading global provider of digital platform engineering and development services. We are committed to having a positive impact on our customers, our employees, and our communities. We embrace a dynamic and inclusive culture. Here you will collaborate with multi-national teams, contribute to a myriad of innovative projects that deliver the most creative and cutting-edge solutions, and have an opportunity to continuously learn and grow. No matter where you are located, you will join a dedicated, creative, and diverse community that will help you discover your fullest potential. Become a key member of our Enterprise Technology group as a Lead Site Reliability Engineer focused on maintaining and advancing critical infrastructure and applications. You will leverage your expertise in DevOps, cloud environments, and automation to design resilient and scalable systems. If you thrive on enhancing infrastructure and promoting continuous delivery practices, we invite you to join our team. Responsibilities Maintain and enhance enterprise application infrastructure using DevOps principles Design and oversee CI/CD pipelines to enable fast and dependable software deployments Administer and tune Kubernetes clusters for optimal scalability and security Create automation tools and scripts primarily in Python Manage cloud infrastructure across Amazon Web Services and Microsoft Azure focusing on security and identity management Partner with development teams to improve infrastructure as code via Terraform Monitor system metrics to proactively maintain high availability Coordinate operational requests and maintenance activities efficiently Diagnose and resolve complex infrastructure and deployment challenges Ensure adherence to security policies and industry best practices Document infrastructure setups and operational standards Contribute to disaster recovery and business continuity strategies Evaluate and integrate new technologies to boost system reliability and efficiency Requirements 5 more years of experience in Site Reliability Engineering or equivalent DevOps roles Advanced proficiency in Python programming Extensive expertise with AWS and Azure including APIs, authentication, and serverless services Comprehensive knowledge of cloud networking, Kubernetes administration, security, IAM, and configuration automation Deep understanding of CI/CD workflows, version control, containerization, and Terraform-based infrastructure management Hands-on experience enabling and enhancing IaaS environments Proven success in enterprise-scale software development and release processes Solid grasp of automation concepts related to CI/CD and infrastructure management Strong analytical and complex problem-solving abilities Capability to handle operational requests and maintenance tasks effectively Excellent communication skills with English proficiency at B2+ level We offer International projects with top brands Work with global teams of highly skilled, diverse peers Employee financial programs Paid time off and sick leave Upskilling, reskilling and certification courses Unlimited access to the LinkedIn Learning library and 22,000+ courses Global career opportunities Volunteer and community involvement opportunities EPAM Employee Groups Award-winning culture recognized by Glassdoor, Newsweek and LinkedIn Seniority level Mid-Senior level Employment type Full-time Job function Engineering, Information Technology, and Business Development Industries Software Development, IT Services and IT Consulting, and Investment Management #J-18808-Ljbffr



  • WorkFromHome, Colombia Masabi A tiempo completo

    A leading fintech company is seeking a Lead Site Reliability Engineer to enhance system reliability. This remote role in Colombia involves designing reliable systems, contributing to incident response, and mentoring teams. Candidates should have substantial SRE or DevOps experience, particularly in AWS and infrastructure automation. A supportive and...


  • WorkFromHome, Colombia Masabi A tiempo completo

    Lead Site Reliability Engineer Introducing Masabi // At Masabi, we’re driving the fare payment revolution, powering the journeys of millions all over the world. We build fare collection platforms that allow riders to seamlessly buy and present tickets for public transport either on their mobile phones, from a ticket machine, or even by tapping their bank...


  • WorkFromHome, Colombia AgileEngine A tiempo completo

    Join to apply for the Site Reliability Engineer ID45689 role at AgileEngine AgileEngine is an Inc. 5000 company that creates award‑winning software for Fortune 500 brands and trailblazing startups across 17+ industries. We rank among the leaders in areas like application development and AI/ML, and our people‑first culture has earned us multiple Best...


  • WorkFromHome, Colombia Masabi A tiempo completo

    Introducing Masabi // At Masabi, we’re driving the fare payment revolution, powering the journeys of millions all over the world. We build fare collection platforms that allow riders to seamlessly buy and present tickets for public transport either on their mobile phones, from a ticket machine, or even by tapping their bank card to travel. Our Justride...


  • WorkFromHome, Colombia Truelogic A tiempo completo

    A leading technology firm in Colombia seeks a Site Reliability Engineer to enhance the reliability of systems on AWS and Kubernetes. The role emphasizes observability and automated responses to system behavior. Candidates should have over five years of experience in SRE roles and expertise in AWS and Kubernetes. This position offers fully remote work,...


  • WorkFromHome, Colombia NiCE A tiempo completo

    A global technology company is seeking a Senior Site Reliability Engineer in Medellín to enhance the reliability and scalability of its platform. This hybrid role offers ownership of critical systems and opportunities for professional growth with comprehensive company benefits. The ideal candidate has extensive experience in Linux and cloud infrastructure,...


  • WorkFromHome, Colombia AgileEngine A tiempo completo

    Site Reliability Engineer (ID45689) – AgileEngine Why Join Us AgileEngine is an Inc. 5000 company that creates award‑winning software for Fortune 500 brands and trailblazing startups across 17+ industries. We rank among the leaders in application development and AI/ML and have earned multiple Best Place to Work awards. If you're looking for a place to...


  • WorkFromHome, Colombia AgileEngine A tiempo completo

    Join to apply for the Site Reliability Engineer ID45689 role at AgileEngine AgileEngine is an Inc. 5000 company that creates award-winning software for Fortune 500 brands and trailblazing startups across 17+ industries. We rank among the leaders in areas like application development and AI/ML, and our people-first culture has earned us multiple Best Place to...


  • WorkFromHome, Colombia Truelogic Software A tiempo completo

    Site Reliability Engineer (AWS) - Technology Join to apply for the Site Reliability Engineer (AWS) - Technology role at Truelogic Software About Truelogic At Truelogic we are a leading provider of nearshore staff augmentation services headquartered in New York. For over two decades, we’ve been delivering top-tier technology solutions to companies of all...

  • Site Reliability Engineer

    hace 2 semanas


    WorkFromHome, Colombia BairesDev A tiempo completo

    Overview Site Reliability Engineer at BairesDev – Remote work We are looking for a Site Reliability Engineer to administer and provide support for the project infrastructure hosted in the cloud while implementing CI/CD pipelines for the automation of deployments. What You Will Do Ensure high service availability, performance, security, and maintainability....