Principal Site Reliability Engineer
hace 4 semanas
Groupon is a marketplace where customers discover new experiences and services everyday and local businesses thrive. To date we have worked with over a million merchant partners worldwide, connecting over 16 million customers with deals across various categories. In a world often dominated by e-commerce giants, we stand out as one of the few platforms uniquely committed to helping local businesses succeed on a performance basis.
Groupon is on a radical journey to transform our business with relentless pursuit of results. Even with thousands of employees spread across multiple continents, we still maintain a culture that inspires innovation, rewards risk-taking and celebrates success. The impact here can be immediate due to our scale and the speed of our transformation. We're a "best of both worlds" kind of company. We're big enough to have the resources and scale, but small enough that a single person has a surprising amount of autonomy and can make a meaningful impact.
**Principal Site Reliability Engineer**
**Role Overview**:
Are you ready to take your expertise to the next level and make a meaningful impact on the reliability and scalability of mission-critical systems? As a Principal Site Reliability Engineer (SRE Level V/VI), you will play a central role in ensuring the performance, availability, and resilience of our platforms. In this position, you will go beyond maintaining systems by leading initiatives that redefine operational excellence. You will collaborate with diverse teams to implement cutting-edge technologies and best practices, foster a culture of reliability, and mentor others in their growth as engineers. This is an exceptional opportunity for someone passionate about solving complex challenges and shaping the future of platform reliability in a high-impact role.
**Key Responsibilities**:
- Architect and maintain fault-tolerant systems, ensuring uptime SLAs of 99.9% or higher.
- Drive automation in infrastructure management and deployment using Terraform, Ansible, Kubernetes, and similar tools.
- Create and optimize CI/CD pipelines to ensure reliable, secure, and efficient software delivery.
- Build and enhance comprehensive observability solutions, including monitoring, logging, and alerting systems using Prometheus, Grafana, and the ELK stack.
- Collaborate with stakeholders to define and achieve SLIs, SLOs, and error budgets aligned with business needs.
- Lead incident response during on-call rotations, ensuring rapid resolution and root cause analysis for critical issues.
- Design and execute performance testing, capacity planning, and scalability strategies for evolving workloads.
- Proactively identify and resolve bottlenecks, increasing system performance and developer efficiency.
- Mentor junior engineers, fostering a collaborative and growth-oriented team environment.
- Guide architectural decisions that drive innovation and enhance system reliability.
**Qualifications**:
- 10+ years in systems engineering, with at least 5+ years in SRE or DevOps roles.
- Expertise in cloud platforms (GCP, AWS) and container orchestration (Kubernetes, Docker).
- Proficiency in programming and scripting languages like Python, Go, and Bash.
- Advanced knowledge of Infrastructure as Code (IaC) tools such as Terraform and Ansible.
- Deep understanding of networking, DNS, load balancing, and security principles.
- Proven track record of managing high-availability systems in demanding environments.
- Exceptional analytical and problem-solving skills.
**Preferred Qualifications**:
- Certifications in cloud or container technologies (e.g., AWS/GCP/Azure, Kubernetes CKA).
- Experience in industries like eCommerce, FinTech, or SaaS.
- Familiarity with Agile development processes and frameworks.
**What We Offer**:
- The opportunity to work with cutting-edge technologies in a transformative environment.
- A collaborative and innovative work culture that values your expertise and contributions.
- Professional growth and leadership development pathways tailored to your aspirations.
- A chance to leave a lasting impact by shaping the future of reliable and scalable systems.
**Join us to push the boundaries of platform reliability and drive meaningful change in a fast-evolving digital world
-
Site Reliability Engineer
hace 3 semanas
colombia NTT DATA Europe & Latam A tiempo completoNTT DATA somos todas las personas que la formamos. Un equipo de más de 139.000 profesionales, tan diverso cómo diversos son los 50 países en los que estamos presentes y los diferentes sectores en los que desarrollamos nuestra actividad; telecomunicaciones, entidades financieras, industria, utilities, energía, administración pública y sanidad. ¿Nuestra...
-
Site Reliability Engineer
hace 3 semanas
colombia NTT DATA Europe & Latam A tiempo completoNTT DATA somos todas las personas que la formamos. Un equipo de más de 139.000 profesionales, tan diverso cómo diversos son los 50 países en los que estamos presentes y los diferentes sectores en los que desarrollamos nuestra actividad; telecomunicaciones, entidades financieras, industria, utilities, energía, administración pública y sanidad. ¿Nuestra...
-
Site Reliability Engineer
hace 5 días
Colombia Captivate IO Ltd A tiempo completoPosition Overview: This is for a "Follow the Sun" model with support in New Zealand, the Philippines and Columbia. We are seeking an experienced Site Reliability Engineer (SRE) to join our dynamic team. The ideal candidate will have extensive experience in DevOps practices, continuous integration and continuous deployment (CI/CD) pipelines, and container...
-
Site Reliability Engineer
hace 5 días
Colombia Captivate IO Ltd A tiempo completoPosition Overview: This is for a "Follow the Sun" model with support in New Zealand, the Philippines and Columbia. We are seeking an experienced Site Reliability Engineer (SRE) to join our dynamic team. The ideal candidate will have extensive experience in DevOps practices, continuous integration and continuous deployment (CI/CD) pipelines, and container...
-
Site Reliability Engineer
hace 4 días
Colombia Lightbird A tiempo completoJoin to apply for the Site Reliability Engineer - NOC role at UnifyCX . UnifyCX is looking for an extraordinary Site Reliability Engineer to join our motivated and ambitious team. This position requires candidates to be graduates/post graduates in Engineering/Computer Science with at least 5+ years of experience working in a SaaS and Cloud environment....
-
Site Reliability Engineer
hace 3 semanas
workfromhome, valle del cauca, colombia BairesDev A tiempo completoSite Reliability Engineer - Remote Work: At BairesDev, we've been leading the way in technology projects for over 15 years. We deliver cutting-edge solutions to giants like Google and the most innovative startups in Silicon Valley. Our diverse 4,000+ team, composed of the world's Top 1% of tech talent, works remotely on roles that drive significant impact...
-
Site Reliability Engineer
hace 3 semanas
workfromhome, valle del cauca, colombia BairesDev A tiempo completoSite Reliability Engineer - Remote Work: At BairesDev, we've been leading the way in technology projects for over 15 years. We deliver cutting-edge solutions to giants like Google and the most innovative startups in Silicon Valley. Our diverse 4,000+ team, composed of the world's Top 1% of tech talent, works remotely on roles that drive significant impact...
-
Site Reliability Engineer
hace 3 semanas
colombia Captivate IO Ltd A tiempo completoPosition Overview: This is for a "Follow the Sun" model with support in New Zealand, the Philippines and Columbia. We are seeking an experienced Site Reliability Engineer (SRE) to join our dynamic team. The ideal candidate will have extensive experience in DevOps practices, continuous integration and continuous deployment (CI/CD) pipelines, and container...
-
Site Reliability Engineer
hace 3 semanas
colombia Captivate IO Ltd A tiempo completoPosition Overview: This is for a "Follow the Sun" model with support in New Zealand, the Philippines and Columbia. We are seeking an experienced Site Reliability Engineer (SRE) to join our dynamic team. The ideal candidate will have extensive experience in DevOps practices, continuous integration and continuous deployment (CI/CD) pipelines, and container...
-
Site Reliability Engineer
hace 5 días
Colombia GoDaddy A tiempo completoLocation Details: Colombia, remote. At GoDaddy, the future of work looks different for each team. Some teams work in the office full-time; others have a hybrid arrangement (they work remotely some days and in the office some days) and some work entirely remotely. Remote: This is a remote position, so you'll be working remotely from your home. You may...
-
Site Reliability Engineer
hace 3 semanas
workfromhome, antioquia, colombia BairesDev A tiempo completoAt BairesDev, we've been leading the way in technology projects for over 15 years. We deliver cutting-edge solutions to giants like Google and the most innovative startups in Silicon Valley. Our diverse 4,000+ team, composed of the world's Top 1% of tech talent, works remotely on roles that drive significant impact worldwide. When you apply for this...
-
Site Reliability Engineer
hace 3 semanas
workfromhome, antioquia, colombia BairesDev A tiempo completoAt BairesDev, we've been leading the way in technology projects for over 15 years. We deliver cutting-edge solutions to giants like Google and the most innovative startups in Silicon Valley. Our diverse 4,000+ team, composed of the world's Top 1% of tech talent, works remotely on roles that drive significant impact worldwide. When you apply for this...
-
Site Reliability Engineer
hace 3 semanas
Colombia Tbwa ChiatDay Inc A tiempo completoSite Reliability Engineer (Colombia, All-Levels) Colombia, Remote The salary range for this role is $2,000 - $9,200 per month (Gross in USD) About Sezzle: With a mission to financially empower the next generation, Sezzle is revolutionizing the shopping experience beyond payments, blending cutting-edge tech with seamless, interest-free installment plans...
-
Site Reliability Engineer
hace 4 días
workfromhome, meta, colombia BairesDev A tiempo completoSite Reliability Engineer - Remote Work: WinDifferent specializes in helping businesses achieve rapid and sustainable growth through our powerful proprietary marketing system. Our data-driven solutions generate positive engagement that leads to ready-to-close opportunities, massively expanding sales pipelines and enabling companies to scale faster than the...
-
Site Reliability Engineer
hace 4 días
workfromhome, meta, colombia BairesDev A tiempo completoSite Reliability Engineer - Remote Work: WinDifferent specializes in helping businesses achieve rapid and sustainable growth through our powerful proprietary marketing system. Our data-driven solutions generate positive engagement that leads to ready-to-close opportunities, massively expanding sales pipelines and enabling companies to scale faster than the...
-
Site Reliability Engineer
hace 3 semanas
workfromhome, rap caribe, colombia BairesDev A tiempo completoSite Reliability Engineer - Remote Work: WinDifferent specializes in helping businesses achieve rapid and sustainable growth through our powerful proprietary marketing system. Our data-driven solutions generate positive engagement that leads to ready-to-close opportunities, massively expanding sales pipelines and enabling companies to scale faster than the...
-
Site Reliability Engineer
hace 3 semanas
workfromhome, rap caribe, colombia BairesDev A tiempo completoSite Reliability Engineer - Remote Work: WinDifferent specializes in helping businesses achieve rapid and sustainable growth through our powerful proprietary marketing system. Our data-driven solutions generate positive engagement that leads to ready-to-close opportunities, massively expanding sales pipelines and enabling companies to scale faster than the...
-
Site Reliability Engineer
hace 3 semanas
workfromhome, rap caribe, colombia BairesDev A tiempo completoSite Reliability Engineer - Remote Work: WinDifferent specializes in helping businesses achieve rapid and sustainable growth through our powerful proprietary marketing system. Our data-driven solutions generate positive engagement that leads to ready-to-close opportunities, massively expanding sales pipelines and enabling companies to scale faster than the...
-
Site Reliability Engineer
hace 3 semanas
workfromhome, rap caribe, colombia BairesDev A tiempo completoSite Reliability Engineer - Remote Work: WinDifferent specializes in helping businesses achieve rapid and sustainable growth through our powerful proprietary marketing system. Our data-driven solutions generate positive engagement that leads to ready-to-close opportunities, massively expanding sales pipelines and enabling companies to scale faster than the...
-
Site Reliability Engineer
hace 4 días
workfromhome, norte de santander, colombia BairesDev A tiempo completoSite Reliability Engineer - Remote Work: WinDifferent specializes in helping businesses achieve rapid and sustainable growth through our powerful proprietary marketing system. Our data-driven solutions generate positive engagement that leads to ready-to-close opportunities, massively expanding sales pipelines and enabling companies to scale faster than the...