Senior Cloud Reliability Engineer
hace 3 semanas
Job Summary
As a Senior Cloud Reliability Engineer at Gorilla Logic, you will be responsible for developing and maintaining advanced observability solutions. This role focuses on enhancing blackbox and whitebox monitoring, implementing synthetic tests, and improving platform reliability across both on-premise and GCP environments, utilizing a variety of cutting-edge technologies.
Key Responsibilities
- Lead efforts in blackbox monitoring, including the development and enhancement of the Health Mesh product.
- Implement and manage synthetic tests that monitor critical platform services, providing early detection of incidents.
- Utilize Prometheus for blackbox monitoring and develop simple Go APIs to support these activities.
- Implement whitebox monitoring strategies with a focus on Service Level Objectives (SLOs) for core Google Cloud Platform (GCP) services and applications on OpenShift.
- Ensure that both platform operators and customers have clear visibility into the system's performance and health.
- Develop and refine anomaly detection mechanisms using the same metrics applied in whitebox monitoring.
- Leverage tools such as Prometheus and Dynatrace to identify and address potential issues before they escalate, contributing to overall platform stability.
- Create tools and processes that help operators distinguish between platform-level incidents and individual user errors.
- Maintain and improve observability tools that support both on-premise and cloud environments, ensuring seamless operation across different infrastructure setups.
- Collaborate with various teams to ensure effective incident management and response.
Requirements
- Bachelor's degree in Computer Science, Engineering, or equivalent experience.
- DevOps & SRE Experience: +3 years of experience in DevOps and Site Reliability Engineering, with a focus on automation, infrastructure as code, and continuous integration/continuous deployment (CI/CD) practices.
- Programming Experience: 3+ years of experience in programming, with a strong focus on Golang development.
- Monitoring Tools Expertise: 3+ years of experience with APM and monitoring tools such as Dynatrace, Prometheus, ELK, Splunk, or similar.
- Cloud and On-Premise Proficiency: Proficiency in Google Cloud Platform (GCP) and experience with on-premise environments, particularly with application deployment and management on OpenShift.
- Container Orchestration: Experience with container orchestration technologies like Kubernetes (K8s) and OpenShift.
- CI/CD Expertise: Experience with CI/CD deployment pipelines, ensuring automated and reliable deployment processes.
- System Architecture: Demonstrable experience in designing and deploying scalable and resilient systems, with an understanding of cloud-native principles.
- System Monitoring and Anomaly Detection: Extensive experience in implementing both blackbox and whitebox monitoring solutions, with a focus on SLOs and anomaly detection.
Bonus Skills
- Linux Background: Knowledge of both Debian and Ubuntu environments.
- Familiarity with Additional Tools: Experience with Jenkins, Terraform, Datadog, K6, or similar technologies.
- Web Technologies: Understanding of web protocols and technologies such as TLS, REST, Nginx, and API gateways.
-
Senior Cloud Reliability Engineer
hace 2 semanas
Colombia WIZELINE A tiempo completoAt Wizeline, we're looking for a skilled Senior Site Reliability Engineer to join our team. As a Senior Site Reliability Engineer, you will play a key role in ensuring the reliability and scalability of our cloud-based systems.Key responsibilities include:Establishing and implementing observability requirements for monitoring, logging, and...
-
Site Reliability Engineer
hace 1 mes
Colombia WIZELINE A tiempo completoAbout the RoleWizeline is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of our cloud-based systems and applications.Key ResponsibilitiesDesign, implement, and maintain scalable and highly available cloud...
-
Senior Cloud Reliability Engineer
hace 1 mes
Colombia WIZELINE A tiempo completoAbout WizelineWizeline is a global digital services company that helps mid-size to Fortune 500 companies build, scale, and deliver high-quality digital products and services.Your RoleAs a Senior Site Reliability Engineer at Wizeline, you will play a critical role in enabling the quick release of quality products, leading to faster innovation cycles for our...
-
Senior Cloud Reliability Engineer
hace 3 semanas
Colombia Wizeline A tiempo completoAt Wizeline, we're on a mission to help mid-size to Fortune 500 companies build, scale, and deliver high-quality digital products and services. Our team thrives in solving customer challenges through human-centered experiences, digital core modernization, and intelligence everywhere (AI/ML and data). We help our clients succeed in building digital...
-
Senior Cloud Engineer
hace 3 semanas
Colombia Gilder Search Group A tiempo completoAt Gilder Search Group, we're passionate about connecting talented professionals with innovative companies. Our mission is to empower individuals to thrive in their careers, and we're committed to making a positive impact in the tech industry.We're seeking a skilled Senior Cloud Engineer to join our team. As a key member of our cloud infrastructure team,...
-
Site Reliability Engineer
hace 3 semanas
Colombia FullStack Labs Inc. A tiempo completoThe Role:We're seeking a highly skilled Site Reliability Engineer to join our team at FullStack Labs Inc. As a Site Reliability Engineer, you will play a critical role in ensuring the smooth operation of our cloud infrastructure and applications.Key Responsibilities:Design and implement scalable and reliable cloud infrastructure solutions.Collaborate with...
-
Senior Cloud Reliability Engineer
hace 3 semanas
Colombia WIZELINE A tiempo completoJob DescriptionWizeline is a global digital services company that helps mid-size to Fortune 500 companies build, scale, and deliver high-quality digital products and services.We thrive in solving our customers' challenges through human-centered experiences, digital core modernization, and intelligence everywhere (AI/ML and data). We help them succeed in...
-
Senior Site Reliability Engineer
hace 4 semanas
Colombia Gorilla Logic A tiempo completoJob Title: Senior Site Reliability EngineerWe are seeking a highly skilled Senior Site Reliability Engineer to join our SRE Cloud Space team. As a key member of our team, you will be responsible for developing and maintaining advanced observability solutions, enhancing blackbox and whitebox monitoring, and improving platform reliability across both...
-
Reliability Engineer
hace 3 semanas
Colombia Captivate IO Ltd A tiempo completoJob DescriptionCaptivate IO Ltd is seeking a highly skilled Reliability Engineer - Cloud Infrastructure to join our team. As a key member of our infrastructure team, you will play a crucial role in ensuring the reliability, scalability, and performance of our cloud-based integration platforms.Key ResponsibilitiesInfrastructure Automation: Design, implement,...
-
Senior Site Reliability Engineer
hace 7 días
Colombia Wizeline A tiempo completoWizeline is a global digital services company that helps businesses build, scale, and deliver high-quality digital products and services. Our team thrives in solving customer challenges through human-centered experiences, digital core modernization, and intelligence everywhere.Your RoleAs a Senior Site Reliability Engineer at Wizeline, you will be...
-
Senior Site Reliability Engineer
hace 5 meses
Colombia, Huila Nucleus Health A tiempo completoA U.S.based company that is on a mission to develop the largest online marketplace and media platform in the world is looking for a Senior DevOps/SRE Engineer. The engineer will be working with cross-functional teams to raise system performance, reliability, and effectiveness. The company is developing a knowledge-commerce platform that connects clients and...
-
Senior Cloud Infrastructure Engineer
hace 2 semanas
Colombia Rocket A tiempo completoSenior Site Reliability EngineerReliability and Scalability Expert | Rocket.Chat | RemoteThis position is for applicants with expertise in cloud infrastructure and reliability engineering.As a Senior Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of Rocket.Chat. Your expertise in designing,...
-
Site Reliability Engineer
hace 3 semanas
Colombia FullStack Labs, LLC A tiempo completoWe're seeking a skilled Site Reliability Engineer to join our team at FullStack Labs, LLC. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of our clients' cloud infrastructure.Key Responsibilities:Design and implement scalable and reliable cloud infrastructure solutionsCollaborate with...
-
Senior Site Reliability Engineer
hace 3 semanas
Colombia Gorilla Logic A tiempo completoSenior Site Reliability Engineer As a Senior Site Reliability Engineer within the SRE Cloud Space team, you will be at the forefront of developing and maintaining advanced observability solutions. This role focuses on enhancing blackbox and whitebox monitoring, implementing synthetic tests, and improving platform reliability across both on-premise and GCP...
-
Senior Software Engineer
hace 2 semanas
Colombia Sofka Technologies A tiempo completoAbout the RoleAs a senior software engineer, you will be responsible for designing and developing cloud-based enterprise software solutions. You will work closely with our team to architect and implement scalable and secure cloud infrastructure.Key ResponsibilitiesDesign and develop cloud-based software solutionsCollaborate with the team to architect and...
-
Site Reliability Engineer
hace 5 meses
Colombia, Huila Datavail A tiempo completo**About the Team** **Job: Site Reliability Engineer - Tier 2** **Experience: 2-5 years (Tier 2)** **Key Skills: Linux, AWS, Terraform** **Required Skills**: - At least 2 years of work experience with: - Linux, Windows, bash scripting, PowerShell and troubleshooting skills - We require at least one associate level cloud AWS certification - Able to...
-
Database Reliability Engineer
hace 4 semanas
Colombia Coupa A tiempo completoJob Title: Sr. Database Reliability EngineerCoupa is a leading provider of AI-driven spend management solutions, connecting and optimizing sourcing, purchasing, supply chains, and financial management for over 3,000 global organizations. We're seeking a highly skilled Sr. Database Reliability Engineer to join our team and contribute to the success of our...
-
Site Reliability Engineer
hace 2 días
Colombia Captivate IO Ltd A tiempo completoPosition Overview: We are seeking an experienced Site Reliability Engineer to join our dynamic team at Captivate IO Ltd. The ideal candidate will have extensive experience in DevOps practices, continuous integration, and continuous deployment (CI/CD) pipelines. As an SRE, you will play a crucial role in ensuring the reliability, scalability, and performance...
-
Senior Cloud Engineer
hace 1 mes
Colombia Celes A tiempo completoEn Celes, estamos buscando un Cloud Engineer Senior para unirse a nuestro equipo y optimizar nuestra infraestructura. ¿Qué buscamos? Experiencia sólida: Al menos 3 años de experiencia en la gestión de entornos cloud (AWS, Azure o GCP). Dominio técnico: Conocimientos avanzados en Kubernetes, Terraform y al menos un lenguaje de programación...
-
Senior Software Engineer
hace 1 semana
Colombia Teleperformance Colombia A tiempo completoJob Title: Senior Software EngineerJob Summary: We are seeking a Senior Software Engineer to join our team. The ideal candidate will have a strong background in cloud computing and software development. Responsibilities: * Design and develop cloud-based software applications * Collaborate with cross-functional teams to ensure successful project delivery *...