Site Reliability Engineer

hace 3 semanas


Bogotá, Bogotá D.E., Colombia Patagonian A tiempo completo

We are seeking a Senior Site Reliability Engineer to join a team that works on a complex distributed architecture, spanning physical machines and virtualizing on-prem host/cloud computing.
The role is to help set up centralized DevOps and help existing teams adopt more centralized best practices. The ideal candidate will have the ability to manage complexity and tackle problems across multiple stack layers as a part of a small team championing operational excellence.

Responsibilities:

  • Architecture and Automation: Design and deploy As-A-Service solutions using open-source software to automate system management, scaling, and monitoring.
  • System Optimization: Develop tools to streamline deployment, monitoring, and incident management for large-scale, distributed environments.
  • Collaboration Across Teams: Work with development and operations teams to design and implement software solutions that enhance the overall reliability of services. Contribute to the ongoing DevOps and Agile transformation.
  • Monitoring & Incident Response: Set up, configure, and maintain monitoring and alerting systems to ensure real-time visibility into system performance. Participate in on-call rotations to respond to incidents and mitigate downtime.
  • CI/CD & Infrastructure Management: Continuously improve CI/CD pipelines using tools like GitLab, Helm, Terraform, and Ansible, ensuring fast, safe, and reliable deployments.
  • Container Orchestration: Leverage container orchestration platforms like Kubernetes (K8S) to manage distributed systems at scale. Experience with Slurm or similar cluster management is a plus.
  • Cloud and Automation Tools: Use cloud infrastructure (AWS, GCP, etc.) and Infrastructure as Code (IaC) tools to automate the provisioning and scaling of resources.
Requirements:
  • Linux Systems: Deep expertise and hands-on experience working with Linux-based systems, with a focus on optimization and troubleshooting.
  • Python Proficiency: Strong skills in Python for scripting, automation, and system management.
  • Containerization & Orchestration: In-depth knowledge of container orchestration technologies such as Kubernetes (K8S). Experience with other cluster management tools like Slurm is a plus.
  • Infrastructure as Code (IaC): Hands-on experience with tools like Helm, Terraform, and Ansible to manage infrastructure in a scalable and automated way.
  • Container Technologies: Strong working knowledge of Docker, Podman, or other containerization systems to enable efficient and consistent deployment.
  • CI/CD Pipelines: Experience working with CI/CD tools, especially GitLab (preferred), GitHub, or Git, to ensure smooth and rapid delivery cycles.
  • Monitoring & Logging: Experience with monitoring and logging solutions such as Prometheus, Grafana, and the ELK stack to provide comprehensive insights into system performance and health.
  • Relational Databases: Understanding of relational databases, their performance tuning, and management in distributed systems.
  • Agile Development: Familiarity with Agile development methodologies, with a focus on continuous improvement and collaboration.
  • Cloud Experience: Exposure to cloud technologies such as AWS or Google Cloud (GCP) is a strong plus.
  • Collaboration & Communication: A team-first attitude with excellent verbal and written communication skills in English, able to work collaboratively with peers across the organization.
#J-18808-Ljbffr

  • Bogotá, Bogotá D.E., Colombia Softtek A tiempo completo

    Senior Site Reliability EngineerSophisticated IT systems require a deep understanding of infrastructure management, cloud platforms, and network principles.We are looking for a seasoned Senior Site Reliability Engineer who can develop and implement advanced monitoring, alerting, and logging solutions to prevent system failures.Main Responsibilities:Implement...


  • Bogotá, Bogotá D.E., Colombia Softtek A tiempo completo

    Senior Site Reliability Engineer3 days ago Be among the first 25 applicantsAt Softtek Colombia, we are hiring the best talent to be part of our team. If you have at least +5 years of experience in this role, it's your timeSite Reliability Engineer SR.Requirements:Experience in infrastructure management, cloud platforms (AWS, Azure, GCP), and Linux/Windows...


  • Bogotá, Bogotá D.E., Colombia CI&T A tiempo completo

    Mid-level Site Reliability Engineer (SRE) - Azure, Colombia We're looking for a Mid Level Site Reliability Engineer (SRE) to help ensure the uptime and reliability of our applications across Azure Cloud, on-prem environments, and Azure Kubernetes Service (AKS) . You'll be responsible for incident response, deployments, automation, and infrastructure support...


  • Bogotá, Bogotá D.E., Colombia Softtek A tiempo completo

    At Softtek Colombia, we are hiring the best talent to be part of our team. If you have at least +5 years of experience in this role, it's your timeSite Reliability Engineer SR.Requirements:Experience in infrastructure management, cloud platforms (AWS, Azure, GCP), and Linux/Windows environments.Strong skills in automation, scripting (Python, Bash,...


  • Bogotá, Bogotá D.E., Colombia CI&T Software S.A. A tiempo completo

    Mid-level Site Reliability Engineer (SRE) - Azure, ColombiaWe are tech transformation specialists, uniting human expertise with AI to create scalable tech solutions.With over 6,500 CI&Ters around the world, we've built partnerships with more than 1,000 clients during our 30 years of history. Artificial Intelligence is our reality.We're looking for a Mid...

  • Site Reliability Lead

    hace 17 horas


    Bogotá, Bogotá D.E., Colombia Softtek A tiempo completo

    As a Site Reliability Lead, you'll oversee the design, implementation, and maintenance of our infrastructure.With a focus on efficiency, scalability, and reliability, you'll develop and optimize monitoring, logging, and alerting solutions.Automation, scripting, and infrastructure as code will be your allies in achieving high availability and...

  • Reliability Engineer

    hace 4 días


    Bogotá, Bogotá D.E., Colombia Tekton Labs A tiempo completo

    Join us at Tekton Labs as a Reliability Engineer and contribute to an exciting project for one of our clients in North America. This role requires a detail-oriented individual with a strong background in functional testing and experience working with Jira.Duties and ResponsibilitiesPerform functional testing to ensure system quality and reliabilityPrepare...


  • Bogotá, Bogotá D.E., Colombia Ciandt A tiempo completo

    We are tech transformation specialists, uniting human expertise with AI to create scalable tech solutions.With over 6,500 CI&Ters around the world, we've built partnerships with more than 1,000 clients during our 30 years of history. Artificial Intelligence is our reality.We're looking for a Mid Level Site Reliability Engineer (SRE) to help ensure the uptime...

  • Site Reliability Engineer

    hace 3 semanas


    Bogotá, Bogotá D.E., Colombia Tbwa ChiatDay Inc A tiempo completo

    The salary range for this role is 20 to 45 lakhs INR per annum (Gross) About Sezzle: With a mission to financially empower the next generation, Sezzle is revolutionizing the shopping experience beyond payments, blending cutting-edge tech with seamless, interest-free installment plans that make shopping smarter and more accessible. We're not just...


  • Bogotá, Bogotá D.E., Colombia Tbwa ChiatDay Inc A tiempo completo

    The salary range for this role is 20 to 45 lakhs INR per annum (Gross)About Sezzle:With a mission to financially empower the next generation, Sezzle is revolutionizing the shopping experience beyond payments, blending cutting-edge tech with seamless, interest-free installment plans that make shopping smarter and more accessible. We're not just transforming...


  • Bogotá, Bogotá D.E., Colombia Infobip ltd. A tiempo completo

    Reliability Operations Engineer page is loadedReliability Operations EngineerApply remote type Hybrid locations Bogota (Colombia) time type Full time posted on Posted Yesterday job requisition id JR102826At Infobip, we dream big. We value creativity, persistence, and innovation, passionately believing that it is through teamwork that we can all reach greater...


  • Bogotá, Bogotá D.E., Colombia Amadeus A tiempo completo

    Service Reliability Engineer page is loadedService Reliability EngineerApply locations Bogota time type Full time posted on Posted 3 Days Ago time left to apply End Date: March 30, 2025 (16 days left to apply) job requisition id R26055Job TitleService Reliability EngineerAbout the Business Area/Department:Are you ready to be part of a dynamic team that keeps...


  • Bogotá, Bogotá D.E., Colombia Infobip ltd. A tiempo completo

    Reliability Operations Engineer page is loaded Reliability Operations Engineer Apply remote type Hybrid locations Bogota (Colombia) time type Full time posted on Posted Yesterday job requisition id JR102826 At Infobip, we dream big. We value creativity, persistence, and innovation, passionately believing that it is through teamwork that we can all reach...


  • Bogotá, Bogotá D.E., Colombia Scotiabank A tiempo completo

    Key Responsibilities Lead senior stakeholder communication and build strategic initiatives to address major challenges in achieving best-in-class stable, reliable, secure, and performing systems.Implement and govern a 'Lean' technology service management system amongst service owners to ensure continuous reliability improvement.Drive prioritization of...


  • Bogotá, Bogotá D.E., Colombia Softtek A tiempo completo

    Senior Site Reliability EngineerA world-class Senior Site Reliability Engineer is a strategic partner who drives business outcomes through technology innovation and operational excellence.This role requires a strong technical foundation, excellent communication skills, and a customer-centric mindset to deliver exceptional results.Main Responsibilities:Design...


  • Bogotá, Bogotá D.E., Colombia Softtek A tiempo completo

    Senior Site Reliability EngineerA successful Senior Site Reliability Engineer is responsible for ensuring the reliability, scalability, and performance of complex IT systems.This role requires a unique blend of technical expertise, business acumen, and collaboration skills to drive continuous improvement and innovation.Main Responsibilities:Develop and...


  • Bogotá, Bogotá D.E., Colombia Ciandt A tiempo completo

    Ci&T is a tech transformation specialist that combines human expertise with AI to drive business growth. With a global presence and 30 years of experience, we've developed strong relationships with over 1,000 clients.We're seeking a skilled Mid-level Site Reliability Engineer to join our team and contribute to the success of our applications. As an SRE,...


  • Bogotá, Bogotá D.E., Colombia Scotiabank A tiempo completo

    Job DescriptionWe are seeking an experienced Senior Site Reliability Engineer to join our Caribbean, Central America & Uruguay (CCAU) Technology Systems Reliability team as the CCAU Director.The successful candidate will be responsible for leading a team of 4 engineers and collaborating with cross-functional teams to design, build, and operate highly...

  • Site Reliability Engineer

    hace 4 semanas


    Bogotá, Bogotá D.E., Colombia Michael Page A tiempo completo

    Acerca de nuestro cliente Nuestro cliente es unaorganización multinacional en la industria tecnología enfocada enservicios financieros. Con un enfoque en la tecnología y lainnovación, están comprometidos con el desarrollo y la entrega desoluciones financieras de vanguardia. Descripción - Desarrollar ymantener sistemas de alta calidad. - Colaborar con...

  • Site Reliability Engineer

    hace 2 semanas


    Bogotá, Bogotá D.E., Colombia Michael Page Colombia A tiempo completo

    Acerca de nuestro cliente Nuestro cliente es una organización multinacional en la industria tecnología enfocada en servicios financieros.Con un enfoque en la tecnología y la innovación, están comprometidos con el desarrollo y la entrega de soluciones financieras de vanguardia.Descripción Desarrollar y mantener sistemas de alta calidad.Colaborar con los...