Senior Cloud Reliability Engineer

hace 3 semanas


Colombia Gorilla Logic A tiempo completo

Job Summary

As a Senior Cloud Reliability Engineer at Gorilla Logic, you will be responsible for developing and maintaining advanced observability solutions. This role focuses on enhancing blackbox and whitebox monitoring, implementing synthetic tests, and improving platform reliability across both on-premise and GCP environments, utilizing a variety of cutting-edge technologies.

Key Responsibilities

  • Lead efforts in blackbox monitoring, including the development and enhancement of the Health Mesh product.
  • Implement and manage synthetic tests that monitor critical platform services, providing early detection of incidents.
  • Utilize Prometheus for blackbox monitoring and develop simple Go APIs to support these activities.
  • Implement whitebox monitoring strategies with a focus on Service Level Objectives (SLOs) for core Google Cloud Platform (GCP) services and applications on OpenShift.
  • Ensure that both platform operators and customers have clear visibility into the system's performance and health.
  • Develop and refine anomaly detection mechanisms using the same metrics applied in whitebox monitoring.
  • Leverage tools such as Prometheus and Dynatrace to identify and address potential issues before they escalate, contributing to overall platform stability.
  • Create tools and processes that help operators distinguish between platform-level incidents and individual user errors.
  • Maintain and improve observability tools that support both on-premise and cloud environments, ensuring seamless operation across different infrastructure setups.
  • Collaborate with various teams to ensure effective incident management and response.

Requirements
  • Bachelor's degree in Computer Science, Engineering, or equivalent experience.
  • DevOps & SRE Experience: +3 years of experience in DevOps and Site Reliability Engineering, with a focus on automation, infrastructure as code, and continuous integration/continuous deployment (CI/CD) practices.
  • Programming Experience: 3+ years of experience in programming, with a strong focus on Golang development.
  • Monitoring Tools Expertise: 3+ years of experience with APM and monitoring tools such as Dynatrace, Prometheus, ELK, Splunk, or similar.
  • Cloud and On-Premise Proficiency: Proficiency in Google Cloud Platform (GCP) and experience with on-premise environments, particularly with application deployment and management on OpenShift.
  • Container Orchestration: Experience with container orchestration technologies like Kubernetes (K8s) and OpenShift.
  • CI/CD Expertise: Experience with CI/CD deployment pipelines, ensuring automated and reliable deployment processes.
  • System Architecture: Demonstrable experience in designing and deploying scalable and resilient systems, with an understanding of cloud-native principles.
  • System Monitoring and Anomaly Detection: Extensive experience in implementing both blackbox and whitebox monitoring solutions, with a focus on SLOs and anomaly detection.

Bonus Skills
  • Linux Background: Knowledge of both Debian and Ubuntu environments.
  • Familiarity with Additional Tools: Experience with Jenkins, Terraform, Datadog, K6, or similar technologies.
  • Web Technologies: Understanding of web protocols and technologies such as TLS, REST, Nginx, and API gateways.


  • Colombia WIZELINE A tiempo completo

    At Wizeline, we're looking for a skilled Senior Site Reliability Engineer to join our team. As a Senior Site Reliability Engineer, you will play a key role in ensuring the reliability and scalability of our cloud-based systems.Key responsibilities include:Establishing and implementing observability requirements for monitoring, logging, and...


  • Colombia WIZELINE A tiempo completo

    About the RoleWizeline is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of our cloud-based systems and applications.Key ResponsibilitiesDesign, implement, and maintain scalable and highly available cloud...


  • Colombia WIZELINE A tiempo completo

    About WizelineWizeline is a global digital services company that helps mid-size to Fortune 500 companies build, scale, and deliver high-quality digital products and services.Your RoleAs a Senior Site Reliability Engineer at Wizeline, you will play a critical role in enabling the quick release of quality products, leading to faster innovation cycles for our...


  • Colombia Wizeline A tiempo completo

    At Wizeline, we're on a mission to help mid-size to Fortune 500 companies build, scale, and deliver high-quality digital products and services. Our team thrives in solving customer challenges through human-centered experiences, digital core modernization, and intelligence everywhere (AI/ML and data). We help our clients succeed in building digital...

  • Senior Cloud Engineer

    hace 3 semanas


    Colombia Gilder Search Group A tiempo completo

    At Gilder Search Group, we're passionate about connecting talented professionals with innovative companies. Our mission is to empower individuals to thrive in their careers, and we're committed to making a positive impact in the tech industry.We're seeking a skilled Senior Cloud Engineer to join our team. As a key member of our cloud infrastructure team,...

  • Site Reliability Engineer

    hace 3 semanas


    Colombia FullStack Labs Inc. A tiempo completo

    The Role:We're seeking a highly skilled Site Reliability Engineer to join our team at FullStack Labs Inc. As a Site Reliability Engineer, you will play a critical role in ensuring the smooth operation of our cloud infrastructure and applications.Key Responsibilities:Design and implement scalable and reliable cloud infrastructure solutions.Collaborate with...


  • Colombia WIZELINE A tiempo completo

    Job DescriptionWizeline is a global digital services company that helps mid-size to Fortune 500 companies build, scale, and deliver high-quality digital products and services.We thrive in solving our customers' challenges through human-centered experiences, digital core modernization, and intelligence everywhere (AI/ML and data). We help them succeed in...


  • Colombia Gorilla Logic A tiempo completo

    Job Title: Senior Site Reliability EngineerWe are seeking a highly skilled Senior Site Reliability Engineer to join our SRE Cloud Space team. As a key member of our team, you will be responsible for developing and maintaining advanced observability solutions, enhancing blackbox and whitebox monitoring, and improving platform reliability across both...

  • Reliability Engineer

    hace 3 semanas


    Colombia Captivate IO Ltd A tiempo completo

    Job DescriptionCaptivate IO Ltd is seeking a highly skilled Reliability Engineer - Cloud Infrastructure to join our team. As a key member of our infrastructure team, you will play a crucial role in ensuring the reliability, scalability, and performance of our cloud-based integration platforms.Key ResponsibilitiesInfrastructure Automation: Design, implement,...


  • Colombia Wizeline A tiempo completo

    Wizeline is a global digital services company that helps businesses build, scale, and deliver high-quality digital products and services. Our team thrives in solving customer challenges through human-centered experiences, digital core modernization, and intelligence everywhere.Your RoleAs a Senior Site Reliability Engineer at Wizeline, you will be...


  • Colombia, Huila Nucleus Health A tiempo completo

    A U.S.based company that is on a mission to develop the largest online marketplace and media platform in the world is looking for a Senior DevOps/SRE Engineer. The engineer will be working with cross-functional teams to raise system performance, reliability, and effectiveness. The company is developing a knowledge-commerce platform that connects clients and...


  • Colombia Rocket A tiempo completo

    Senior Site Reliability EngineerReliability and Scalability Expert | Rocket.Chat | RemoteThis position is for applicants with expertise in cloud infrastructure and reliability engineering.As a Senior Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of Rocket.Chat. Your expertise in designing,...

  • Site Reliability Engineer

    hace 3 semanas


    Colombia FullStack Labs, LLC A tiempo completo

    We're seeking a skilled Site Reliability Engineer to join our team at FullStack Labs, LLC. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of our clients' cloud infrastructure.Key Responsibilities:Design and implement scalable and reliable cloud infrastructure solutionsCollaborate with...


  • Colombia Gorilla Logic A tiempo completo

    Senior Site Reliability Engineer As a Senior Site Reliability Engineer within the SRE Cloud Space team, you will be at the forefront of developing and maintaining advanced observability solutions. This role focuses on enhancing blackbox and whitebox monitoring, implementing synthetic tests, and improving platform reliability across both on-premise and GCP...

  • Senior Software Engineer

    hace 2 semanas


    Colombia Sofka Technologies A tiempo completo

    About the RoleAs a senior software engineer, you will be responsible for designing and developing cloud-based enterprise software solutions. You will work closely with our team to architect and implement scalable and secure cloud infrastructure.Key ResponsibilitiesDesign and develop cloud-based software solutionsCollaborate with the team to architect and...


  • Colombia, Huila Datavail A tiempo completo

    **About the Team** **Job: Site Reliability Engineer - Tier 2** **Experience: 2-5 years (Tier 2)** **Key Skills: Linux, AWS, Terraform** **Required Skills**: - At least 2 years of work experience with: - Linux, Windows, bash scripting, PowerShell and troubleshooting skills - We require at least one associate level cloud AWS certification - Able to...


  • Colombia Coupa A tiempo completo

    Job Title: Sr. Database Reliability EngineerCoupa is a leading provider of AI-driven spend management solutions, connecting and optimizing sourcing, purchasing, supply chains, and financial management for over 3,000 global organizations. We're seeking a highly skilled Sr. Database Reliability Engineer to join our team and contribute to the success of our...


  • Colombia Captivate IO Ltd A tiempo completo

    Position Overview: We are seeking an experienced Site Reliability Engineer to join our dynamic team at Captivate IO Ltd. The ideal candidate will have extensive experience in DevOps practices, continuous integration, and continuous deployment (CI/CD) pipelines. As an SRE, you will play a crucial role in ensuring the reliability, scalability, and performance...


  • Colombia Celes A tiempo completo

    En Celes, estamos buscando un Cloud Engineer Senior para unirse a nuestro equipo y optimizar nuestra infraestructura. ¿Qué buscamos? Experiencia sólida: Al menos 3 años de experiencia en la gestión de entornos cloud (AWS, Azure o GCP). Dominio técnico: Conocimientos avanzados en Kubernetes, Terraform y al menos un lenguaje de programación...

  • Senior Software Engineer

    hace 1 semana


    Colombia Teleperformance Colombia A tiempo completo

    Job Title: Senior Software EngineerJob Summary: We are seeking a Senior Software Engineer to join our team. The ideal candidate will have a strong background in cloud computing and software development. Responsibilities: * Design and develop cloud-based software applications * Collaborate with cross-functional teams to ensure successful project delivery *...