Site Reliability Engineer

hace 2 días


Colombia Lean Tech A tiempo completo

Company Overview:

Lean Tech is a rapidly expanding organization situated in Medellín, Colombia. We pride ourselves on possessing one of the most influential networks within software development and IT services for the entertainment, financial, and logistics sectors. Our corporate projections offer a multitude of opportunities for professionals to elevate their careers and experience substantial growth. Joining our team means engaging with expansive engineering teams across Latin America and the United States, contributing to cutting-edge developments in multiple industries.

Currently, we are seeking a Site Reliability Engineer (SRE) to join our team. Here are the challenges that our next warrior will face and the requirements we look for:

Position Title: Site Reliability Engineer (SRE)

Location: Remote (Colombia)

What you will be doing:

This senior-level position is focused on the design, implementation, and maintenance of robust, scalable, and high-performing infrastructure. The primary purpose of this role is to collaborate closely with development teams to ensure system stability and scalability through advanced automation and monitoring improvements. Key responsibilities include architecting, deploying, and maintaining systems on AWS, managing Kubernetes clusters, and developing CI/CD pipelines. This position requires an advanced understanding of AWS, Kubernetes, Prometheus, and Grafana, as well as proficiency in scripting with Python, Bash, or Go. The role is integral to the company’s broader mission, emphasizing streamlined integration and deployment within a collaborative work environment.

  1. Architect and maintain scalable, reliable systems on AWS, utilizing advanced AWS best practices.
  2. Oversee Kubernetes clusters to ensure optimal performance and availability in production environments.
  3. Develop and implement comprehensive monitoring and visualization strategies utilizing Prometheus and Grafana.
  4. Define, measure, and report on SLOs, SLIs, and SLAs to continuously enhance system reliability and performance.
  5. Drive automation of operational tasks through Infrastructure as Code tools like Terraform and CloudFormation.
  6. Create robust CI/CD pipelines to facilitate seamless and efficient software deployments.
  7. Perform in-depth root cause analyses on production issues and implement comprehensive solutions to prevent recurrence.
  8. Design, update, and manage detailed runbooks and escalation processes to improve incident management efficiency.
  9. Collaborate closely with development and DevOps teams to ensure effective integration and deployment processes.
  10. Document systems, configurations, and processes with precision to support operational continuity and knowledge sharing.

Required Skills & Experience:

  1. Advanced proficiency in AWS services for architecting, deploying, and maintaining scalable and reliable systems.
  2. Advanced expertise in managing Kubernetes in production environments to ensure high availability and performance.
  3. Strong proficiency in Prometheus for monitoring and Grafana for visualization.
  4. Intermediate understanding and use of CI/CD tools such as GitHub Actions, Jenkins, GitLab CI/CD, or CircleCI.
  5. Intermediate proficiency with Infrastructure as Code tools like Terraform or CloudFormation.
  6. Experience with configuration management tools including Ansible, Chef, or Puppet.
  7. Proficient in scripting languages such as Python, Bash, or Go.
  8. Solid understanding of Linux/Unix systems and networking concepts.
  9. Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent experience.
  10. Minimum of 3 years in a Site Reliability Engineer or DevOps role.

Nice to Haves:

  1. Experience with log aggregation tools such as ELK Stack or Fluentd for efficient log management.
  2. Knowledge of database systems, both SQL and NoSQL, to support diverse data storage needs.
  3. Familiarity with service meshes like Traefik, Istio, or Linkerd to enhance microservices communication.
  4. Experience with cloud-native application development and serverless architectures.
  5. Excellent problem-solving skills with a focus on improving system efficiency and performance.
  6. Strong communication and collaboration abilities for effective team interaction.

Soft Skills:

  1. Excellent problem-solving and analytical skills.
  2. Strong communication and collaboration abilities, with the capacity to work effectively across different time zones.

Why you will love Lean Tech:

  1. Join a powerful tech workforce and help us change the world through technology.
  2. Professional development opportunities with international customers.
  3. Collaborative work environment.
  4. Career path and mentorship programs that will lead to new levels.

Join Lean Tech and contribute to shaping the data landscape within a dynamic and growing organization. Your skills will be honed, and your contributions will play a vital role in our continued success. Lean Tech is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees.

#J-18808-Ljbffr

  • Colombia Captivate IO Ltd A tiempo completo

    Position Overview: We are seeking an experienced Site Reliability Engineer to join our dynamic team at Captivate IO Ltd. The ideal candidate will have extensive experience in DevOps practices, continuous integration, and continuous deployment (CI/CD) pipelines. As an SRE, you will play a crucial role in ensuring the reliability, scalability, and performance...

  • Azure DevOps Engineer

    hace 6 meses


    Colombia Axiom Path Inc A tiempo completo

    **Azure DevOps Engineer / Site Reliability Engineer** **Contract, 100% REMOTE** - In this role, you will leverage your DevOps expertise to design, automate, and streamline the software development lifecycle while playing a crucial role in maintaining website uptime. This role requires a strong ability to handle emergencies, troubleshoot website outages, and...


  • Colombia Captivate IO Ltd A tiempo completo

    Position Overview: This is for a "Follow the Sun" model with support in New Zealand, the Philippines and Columbia. We are seeking an experienced Site Reliability Engineer (SRE) to join our dynamic team. The ideal candidate will have extensive experience in DevOps practices, continuous integration and continuous deployment (CI/CD) pipelines, and container...


  • Colombia Sana Commerce A tiempo completo

    Medellín- - IT**Junior Site Reliability Engineer**: - Medellín IT - At Sana Commerce we're committed to an inclusive environment and recognize that our diverse work\force is one of our greatest strengths._ It all started in 2007, with a pizza and a plan. **Sana Commerce is an e-commerce platform designed to help manufacturers, distributors and...


  • Colombia WIZELINE A tiempo completo

    About the RoleWizeline is a global digital services company that helps mid-size to Fortune 500 companies build, scale, and deliver high-quality digital products and services. As a key member of our team, you will play a crucial role in ensuring the reliability and efficiency of our technology infrastructure.Your Day-to-DayYou will be responsible for...


  • Colombia Rocket A tiempo completo

    About the Role">We are seeking a skilled Senior Cloud Systems Specialist to join our team at Rocket.Chat. As a Senior Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of our platform.">Your Responsibilities">Develop and maintain Infrastructure as Code (IaC) using tools like Terraform and...


  • Colombia, Huila Nucleus Health A tiempo completo

    A U.S.based company that is on a mission to develop the largest online marketplace and media platform in the world is looking for a Senior DevOps/SRE Engineer. The engineer will be working with cross-functional teams to raise system performance, reliability, and effectiveness. The company is developing a knowledge-commerce platform that connects clients and...

  • Site Reliability Engineer

    hace 4 semanas


    Colombia Tbwa ChiatDay Inc A tiempo completo

    Site Reliability Engineer (Colombia, All-Levels) Colombia, Remote The salary range for this role is $2,000 - $9,200 per month (Gross in USD) About Sezzle: With a mission to financially empower the next generation, Sezzle is revolutionizing the shopping experience beyond payments, blending cutting-edge tech with seamless, interest-free installment...


  • Colombia, Huila Datavail A tiempo completo

    At least 2 years of hands-on experience with AWS - We require at least one AWS associate level certification. - Able to contribute through CloudFormation / Terraform - Good knowledge of AWS core services related to Infrastructure (EC2, ECS, EKS, RDS, EBS etc.), Networking (VPC, Network Security Groups, Peering, Transit Gateway, site-to-site VPN etc.),...


  • Colombia Times Internet A tiempo completo

    About the RoleTimes Internet is a leading digital media company looking for an experienced DevOps engineer to join our team. As a DevOps engineer, you will play a critical role in simplifying and enhancing the lives of millions of users.Key ResponsibilitiesCollaborate with cross-functional teams to design, build, and maintain CI/CD pipelines, automate...


  • Colombia Tbwa ChiatDay Inc A tiempo completo

    Senior Site Reliability Engineer (Colombia) Colombia, Remote The salary range for this role is $5,000 - $9,200 per month (Gross in USD) About Sezzle: With a mission to financially empower the next generation, Sezzle is revolutionizing the shopping experience beyond payments, blending cutting-edge tech with seamless, interest-free installment plans...


  • Colombia Sezzle A tiempo completo

    Unlock the Future of E-commerce as a Senior Site Reliability Engineer at SezzleWe are seeking a highly skilled and motivated Senior Site Reliability Engineer to join our dynamic team. As a key member of our Infrastructure and Security team, you will play a vital role in designing, building, running, improving, and scaling the infrastructure that engineering...

  • Reliability Expert

    hace 1 mes


    Colombia Rocket A tiempo completo

    Role SummaryRocket.Chat is seeking a highly skilled Senior Site Reliability Engineer to join their team. As a key member of the team, you will play a critical role in ensuring the reliability, scalability, and performance of Rocket.Chat.Mandatory SkillsStrong proficiency in Linux/Unix systems administrationProficiency in scripting languages such as Python,...


  • Colombia Captivate Io Ltd A tiempo completo

    **Job Overview:** We are seeking a skilled Site Reliability Engineer to join our team at Captivate Io Ltd. The ideal candidate will have extensive experience in DevOps practices, continuous integration and deployment (CI/CD) pipelines, and container orchestration with Kubernetes.**Key Responsibilities:Infrastructure Automation: Design, implement, and...


  • Colombia WIZELINE A tiempo completo

    At Wizeline, we're a team of innovative problem solvers dedicated to delivering exceptional digital experiences for our clients. We believe that great technology begins with outstanding talent and diversity of thought. Our business is built on doing well and doing good, and our values of Ownership, Innovation, Community, and Diversity & Inclusion are deeply...


  • Colombia Gorilla Logic A tiempo completo

    Gorilla Logic, a nearshore Agile team provider, seeks an experienced Senior Cloud Engineer to lead the development of scalable systems. This full-time remote role is ideal for someone who excels in infrastructure as code (IaC), software development, and continuous integration.About Gorilla Logic: With offices in the United States, Costa Rica, Colombia, and...


  • Colombia Rocket A tiempo completo

    Senior Site Reliability EngineerReliability and Scalability Expert | Rocket.Chat | RemoteThis position is for applicants with expertise in cloud infrastructure and reliability engineering.As a Senior Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of Rocket.Chat. Your expertise in designing,...

  • QA Automation Engineer

    hace 3 meses


    Colombia Software Defined Automation GmbH A tiempo completo

    POSTED August 20, 2024 Antioquia , Colombia On-Site Full Time About the position Linqia is looking for a Software Quality Assurance Engineer to develop and execute tests as well as automated tests in order to ensure product quality. As a Software QA Engineer you will estimate, plan, and coordinate testing activities. You will also ensure that quality...


  • Colombia Gorilla Logic A tiempo completo

    Job OverviewGorilla Logic offers a unique opportunity for a Lead DevOps Engineer to join our team. As a key member of our Agile team, you will be responsible for driving the technical aspects of our DevOps practices.ResponsibilitiesTroubleshoot production issues and ensure site reliability through proactive monitoring and disaster recovery planning.Develop...


  • Colombia Two95 International Inc. A tiempo completo

    Responsibilities: Day-to-day administration, monitoring, and maintenance related to routers, switches, firewalls, load balancers, packet shapers, wireless systems, and circuits. Design, implement, and monitor network systems related to Cisco network routers, switches, firewalls, load balancers, WiFi systems, Circuits: WAN/MPLS, internet, replication,...