Principal Site Reliability Engineer

hace 1 día


Colombia Huila Groupon A tiempo completo

Groupon is a marketplace where customers discover new experiences and services everyday and local businesses thrive. To date we have worked with over a million merchant partners worldwide, connecting over 16 million customers with deals across various categories. In a world often dominated by e-commerce giants, we stand out as one of the few platforms uniquely committed to helping local businesses succeed on a performance basis. Groupon is on a radical journey to transform our business with relentless pursuit of results. Even with thousands of employees spread across multiple continents, we still maintain a culture that inspires innovation, rewards risk-taking and celebrates success. The impact here can be immediate due to our scale and the speed of our transformation. We're a "best of both worlds" kind of company. We're big enough to have the resources and scale, but small enough that a single person has a surprising amount of autonomy and can make a meaningful impact. **Principal Site Reliability Engineer** **Role Overview**: Are you ready to take your expertise to the next level and make a meaningful impact on the reliability and scalability of mission-critical systems? As a Principal Site Reliability Engineer (SRE Level V/VI), you will play a central role in ensuring the performance, availability, and resilience of our platforms. In this position, you will go beyond maintaining systems by leading initiatives that redefine operational excellence. You will collaborate with diverse teams to implement cutting-edge technologies and best practices, foster a culture of reliability, and mentor others in their growth as engineers. This is an exceptional opportunity for someone passionate about solving complex challenges and shaping the future of platform reliability in a high-impact role. **Key Responsibilities**: - Architect and maintain fault-tolerant systems, ensuring uptime SLAs of 99.9% or higher. - Drive automation in infrastructure management and deployment using Terraform, Ansible, Kubernetes, and similar tools. - Create and optimize CI/CD pipelines to ensure reliable, secure, and efficient software delivery. - Build and enhance comprehensive observability solutions, including monitoring, logging, and alerting systems using Prometheus, Grafana, and the ELK stack. - Collaborate with stakeholders to define and achieve SLIs, SLOs, and error budgets aligned with business needs. - Lead incident response during on-call rotations, ensuring rapid resolution and root cause analysis for critical issues. - Design and execute performance testing, capacity planning, and scalability strategies for evolving workloads. - Proactively identify and resolve bottlenecks, increasing system performance and developer efficiency. - Mentor junior engineers, fostering a collaborative and growth-oriented team environment. - Guide architectural decisions that drive innovation and enhance system reliability. **Qualifications**: - 10+ years in systems engineering, with at least 5+ years in SRE or DevOps roles. - Expertise in cloud platforms (GCP, AWS) and container orchestration (Kubernetes, Docker). - Proficiency in programming and scripting languages like Python, Go, and Bash. - Advanced knowledge of Infrastructure as Code (IaC) tools such as Terraform and Ansible. - Deep understanding of networking, DNS, load balancing, and security principles. - Proven track record of managing high-availability systems in demanding environments. - Exceptional analytical and problem-solving skills. **Preferred Qualifications**: - Certifications in cloud or container technologies (e.g., AWS/GCP/Azure, Kubernetes CKA). - Experience in industries like eCommerce, FinTech, or SaaS. - Familiarity with Agile development processes and frameworks. **What We Offer**: - The opportunity to work with cutting-edge technologies in a transformative environment. - A collaborative and innovative work culture that values your expertise and contributions. - Professional growth and leadership development pathways tailored to your aspirations. - A chance to leave a lasting impact by shaping the future of reliable and scalable systems. **Join us to push the boundaries of platform reliability and drive meaningful change in a fast-evolving digital world



  • Colombia MAS Global Consulting A tiempo completo

    Who We AreAt MAS Global Consulting, we are a premium digital engineering partner delivering technology solutions to some of the world's most innovative companies — from high-growth startups to Fortune 500 enterprises.With a people-first culture and a commitment to excellence, we combine nearshore talent, agile delivery, and technical depth to build...

  • Azure DevOps Engineer

    hace 6 días


    Colombia Axiom Path Inc A tiempo completo

    **Azure DevOps Engineer / Site Reliability Engineer** **Contract, 100% REMOTE** - In this role, you will leverage your DevOps expertise to design, automate, and streamline the software development lifecycle while playing a crucial role in maintaining website uptime. This role requires a strong ability to handle emergencies, troubleshoot website outages, and...


  • Colombia Sana Commerce A tiempo completo

    Medellín- - IT**Junior Site Reliability Engineer**: - Medellín IT - At Sana Commerce we're committed to an inclusive environment and recognize that our diverse work\force is one of our greatest strengths._ It all started in 2007, with a pizza and a plan. **Sana Commerce is an e-commerce platform designed to help manufacturers, distributors and...

  • Reliability Engineer

    hace 6 días


    Colombia, Huila Baker Hughes A tiempo completo

    Role Description **Reliability Engineer** **Summary** Can work with limited supervision on assigned tasks with standard techniques to build on basic knowledge and develop skills in specific practice areas. Interacts with clients and client organisations and has an understanding of how maintenance management is executed. Understands project management...


  • Colombia Felix Technologies, Inc. A tiempo completo

    About Us At Félix, we're building the financial ecosystem for Latin immigrants in the U.S., starting with a revolution in remittances. Our core product is an AI-powered chatbot built on WhatsApp, allowing our users to send money home as easily as sending a text message. We leverage cutting-edge technology like AI, blockchain, and stablecoins to make...


  • Colombia, Huila Groupon A tiempo completo

    Groupon is a marketplace where customers discover new experiences and services everyday and local businesses thrive. To date we have worked with over a million merchant partners worldwide, connecting over 16 million customers with deals across various categories. In a world often dominated by e-commerce giants, we stand out as one of the few platforms...


  • Colombia MAS Global Consulting A tiempo completo

    Who We AreAt MAS Global Consulting, we bring together diverse engineering talent and meaningful work opportunities with global clients who value innovation, quality, and people-first collaboration. Our mission is to help organizations build scalable, modern, and resilient platforms while enabling our consultants to grow in their careers.We are proud to...


  • Colombia Kyndryl Colombia SAS A tiempo completo

    **Why Kyndryl** Kyndryl is a market leader that thinks and acts like a start-up. We design, build, manage, and modernize the mission-critical technology systems that the world depends on every day. So why work at Kyndryl? We are always moving forward - always pushing ourselves to go further in our efforts to build a more equitable, inclusive world for our...

  • Reliability Engineer

    hace 2 semanas


    Colombia Neostella A tiempo completo

    **TO BE CONSIDERED, CANDIDATES MUST HAVE A TOURIST VISA TO TRAVEL TO THE UNITED STATES** **What you’ll do**: Our manufacturing client is looking for a Reliability Engineer. The Reliability Engineer is a hands-on technical role that is critical in leading all engineering efforts in collaboration with the operating teams & unit operations across multiple...


  • Colombia Yuxi Global A tiempo completo

    Company Description Yuxi Global is an American company with high functional teams across Latin America. We stay updated with the most modern, edge practices and technologies. Our teams are versatile, adaptable and have expertise in a wide range of programming languages, databases and frameworks. This is your invitation to someone who loves working with the...