Site Reliability Engineer
hace 2 días
Company Overview:
Lean Tech is a rapidly expanding organization situated in Medellín, Colombia. We pride ourselves on possessing one of the most influential networks within software development and IT services for the entertainment, financial, and logistics sectors. Our corporate projections offer a multitude of opportunities for professionals to elevate their careers and experience substantial growth. Joining our team means engaging with expansive engineering teams across Latin America and the United States, contributing to cutting-edge developments in multiple industries.
Currently, we are seeking a Site Reliability Engineer (SRE) to join our team. Here are the challenges that our next warrior will face and the requirements we look for:
Position Title: Site Reliability Engineer (SRE)
Location: Remote (Colombia)
What you will be doing:
This senior-level position is focused on the design, implementation, and maintenance of robust, scalable, and high-performing infrastructure. The primary purpose of this role is to collaborate closely with development teams to ensure system stability and scalability through advanced automation and monitoring improvements. Key responsibilities include architecting, deploying, and maintaining systems on AWS, managing Kubernetes clusters, and developing CI/CD pipelines. This position requires an advanced understanding of AWS, Kubernetes, Prometheus, and Grafana, as well as proficiency in scripting with Python, Bash, or Go. The role is integral to the company’s broader mission, emphasizing streamlined integration and deployment within a collaborative work environment.
- Architect and maintain scalable, reliable systems on AWS, utilizing advanced AWS best practices.
- Oversee Kubernetes clusters to ensure optimal performance and availability in production environments.
- Develop and implement comprehensive monitoring and visualization strategies utilizing Prometheus and Grafana.
- Define, measure, and report on SLOs, SLIs, and SLAs to continuously enhance system reliability and performance.
- Drive automation of operational tasks through Infrastructure as Code tools like Terraform and CloudFormation.
- Create robust CI/CD pipelines to facilitate seamless and efficient software deployments.
- Perform in-depth root cause analyses on production issues and implement comprehensive solutions to prevent recurrence.
- Design, update, and manage detailed runbooks and escalation processes to improve incident management efficiency.
- Collaborate closely with development and DevOps teams to ensure effective integration and deployment processes.
- Document systems, configurations, and processes with precision to support operational continuity and knowledge sharing.
Required Skills & Experience:
- Advanced proficiency in AWS services for architecting, deploying, and maintaining scalable and reliable systems.
- Advanced expertise in managing Kubernetes in production environments to ensure high availability and performance.
- Strong proficiency in Prometheus for monitoring and Grafana for visualization.
- Intermediate understanding and use of CI/CD tools such as GitHub Actions, Jenkins, GitLab CI/CD, or CircleCI.
- Intermediate proficiency with Infrastructure as Code tools like Terraform or CloudFormation.
- Experience with configuration management tools including Ansible, Chef, or Puppet.
- Proficient in scripting languages such as Python, Bash, or Go.
- Solid understanding of Linux/Unix systems and networking concepts.
- Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent experience.
- Minimum of 3 years in a Site Reliability Engineer or DevOps role.
Nice to Haves:
- Experience with log aggregation tools such as ELK Stack or Fluentd for efficient log management.
- Knowledge of database systems, both SQL and NoSQL, to support diverse data storage needs.
- Familiarity with service meshes like Traefik, Istio, or Linkerd to enhance microservices communication.
- Experience with cloud-native application development and serverless architectures.
- Excellent problem-solving skills with a focus on improving system efficiency and performance.
- Strong communication and collaboration abilities for effective team interaction.
Soft Skills:
- Excellent problem-solving and analytical skills.
- Strong communication and collaboration abilities, with the capacity to work effectively across different time zones.
Why you will love Lean Tech:
- Join a powerful tech workforce and help us change the world through technology.
- Professional development opportunities with international customers.
- Collaborative work environment.
- Career path and mentorship programs that will lead to new levels.
Join Lean Tech and contribute to shaping the data landscape within a dynamic and growing organization. Your skills will be honed, and your contributions will play a vital role in our continued success. Lean Tech is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees.
#J-18808-Ljbffr-
Site Reliability Engineer
hace 1 mes
Colombia Captivate IO Ltd A tiempo completoPosition Overview: We are seeking an experienced Site Reliability Engineer to join our dynamic team at Captivate IO Ltd. The ideal candidate will have extensive experience in DevOps practices, continuous integration, and continuous deployment (CI/CD) pipelines. As an SRE, you will play a crucial role in ensuring the reliability, scalability, and performance...
-
Azure DevOps Engineer
hace 6 meses
Colombia Axiom Path Inc A tiempo completo**Azure DevOps Engineer / Site Reliability Engineer** **Contract, 100% REMOTE** - In this role, you will leverage your DevOps expertise to design, automate, and streamline the software development lifecycle while playing a crucial role in maintaining website uptime. This role requires a strong ability to handle emergencies, troubleshoot website outages, and...
-
Site Reliability Engineer
hace 3 meses
Colombia Captivate IO Ltd A tiempo completoPosition Overview: This is for a "Follow the Sun" model with support in New Zealand, the Philippines and Columbia. We are seeking an experienced Site Reliability Engineer (SRE) to join our dynamic team. The ideal candidate will have extensive experience in DevOps practices, continuous integration and continuous deployment (CI/CD) pipelines, and container...
-
Junior Site Reliability Engineer
hace 6 meses
Colombia Sana Commerce A tiempo completoMedellín- - IT**Junior Site Reliability Engineer**: - Medellín IT - At Sana Commerce we're committed to an inclusive environment and recognize that our diverse work\force is one of our greatest strengths._ It all started in 2007, with a pizza and a plan. **Sana Commerce is an e-commerce platform designed to help manufacturers, distributors and...
-
Site Reliability Engineer
hace 1 mes
Colombia WIZELINE A tiempo completoAbout the RoleWizeline is a global digital services company that helps mid-size to Fortune 500 companies build, scale, and deliver high-quality digital products and services. As a key member of our team, you will play a crucial role in ensuring the reliability and efficiency of our technology infrastructure.Your Day-to-DayYou will be responsible for...
-
Reliability Infrastructure Engineer
hace 2 semanas
Colombia Rocket A tiempo completoAbout the Role">We are seeking a skilled Senior Cloud Systems Specialist to join our team at Rocket.Chat. As a Senior Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of our platform.">Your Responsibilities">Develop and maintain Infrastructure as Code (IaC) using tools like Terraform and...
-
Senior Site Reliability Engineer
hace 6 meses
Colombia, Huila Nucleus Health A tiempo completoA U.S.based company that is on a mission to develop the largest online marketplace and media platform in the world is looking for a Senior DevOps/SRE Engineer. The engineer will be working with cross-functional teams to raise system performance, reliability, and effectiveness. The company is developing a knowledge-commerce platform that connects clients and...
-
Site Reliability Engineer
hace 4 semanas
Colombia Tbwa ChiatDay Inc A tiempo completoSite Reliability Engineer (Colombia, All-Levels) Colombia, Remote The salary range for this role is $2,000 - $9,200 per month (Gross in USD) About Sezzle: With a mission to financially empower the next generation, Sezzle is revolutionizing the shopping experience beyond payments, blending cutting-edge tech with seamless, interest-free installment...
-
Site Reliability Engineer
hace 6 meses
Colombia, Huila Datavail A tiempo completoAt least 2 years of hands-on experience with AWS - We require at least one AWS associate level certification. - Able to contribute through CloudFormation / Terraform - Good knowledge of AWS core services related to Infrastructure (EC2, ECS, EKS, RDS, EBS etc.), Networking (VPC, Network Security Groups, Peering, Transit Gateway, site-to-site VPN etc.),...
-
Systems Reliability Engineer
hace 4 semanas
Colombia Times Internet A tiempo completoAbout the RoleTimes Internet is a leading digital media company looking for an experienced DevOps engineer to join our team. As a DevOps engineer, you will play a critical role in simplifying and enhancing the lives of millions of users.Key ResponsibilitiesCollaborate with cross-functional teams to design, build, and maintain CI/CD pipelines, automate...
-
Senior Site Reliability Engineer
hace 3 semanas
Colombia Tbwa ChiatDay Inc A tiempo completoSenior Site Reliability Engineer (Colombia) Colombia, Remote The salary range for this role is $5,000 - $9,200 per month (Gross in USD) About Sezzle: With a mission to financially empower the next generation, Sezzle is revolutionizing the shopping experience beyond payments, blending cutting-edge tech with seamless, interest-free installment plans...
-
Reliability Expert with Scalability Focus
hace 3 semanas
Colombia Sezzle A tiempo completoUnlock the Future of E-commerce as a Senior Site Reliability Engineer at SezzleWe are seeking a highly skilled and motivated Senior Site Reliability Engineer to join our dynamic team. As a key member of our Infrastructure and Security team, you will play a vital role in designing, building, running, improving, and scaling the infrastructure that engineering...
-
Reliability Expert
hace 1 mes
Colombia Rocket A tiempo completoRole SummaryRocket.Chat is seeking a highly skilled Senior Site Reliability Engineer to join their team. As a key member of the team, you will play a critical role in ensuring the reliability, scalability, and performance of Rocket.Chat.Mandatory SkillsStrong proficiency in Linux/Unix systems administrationProficiency in scripting languages such as Python,...
-
Reliability Engineering Specialist
hace 2 semanas
Colombia Captivate Io Ltd A tiempo completo**Job Overview:** We are seeking a skilled Site Reliability Engineer to join our team at Captivate Io Ltd. The ideal candidate will have extensive experience in DevOps practices, continuous integration and deployment (CI/CD) pipelines, and container orchestration with Kubernetes.**Key Responsibilities:Infrastructure Automation: Design, implement, and...
-
Senior Reliability Engineering Specialist
hace 1 mes
Colombia WIZELINE A tiempo completoAt Wizeline, we're a team of innovative problem solvers dedicated to delivering exceptional digital experiences for our clients. We believe that great technology begins with outstanding talent and diversity of thought. Our business is built on doing well and doing good, and our values of Ownership, Innovation, Community, and Diversity & Inclusion are deeply...
-
Senior Cloud Engineer for Scalable Systems
hace 2 semanas
Colombia Gorilla Logic A tiempo completoGorilla Logic, a nearshore Agile team provider, seeks an experienced Senior Cloud Engineer to lead the development of scalable systems. This full-time remote role is ideal for someone who excels in infrastructure as code (IaC), software development, and continuous integration.About Gorilla Logic: With offices in the United States, Costa Rica, Colombia, and...
-
Senior Cloud Infrastructure Engineer
hace 2 meses
Colombia Rocket A tiempo completoSenior Site Reliability EngineerReliability and Scalability Expert | Rocket.Chat | RemoteThis position is for applicants with expertise in cloud infrastructure and reliability engineering.As a Senior Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of Rocket.Chat. Your expertise in designing,...
-
QA Automation Engineer
hace 3 meses
Colombia Software Defined Automation GmbH A tiempo completoPOSTED August 20, 2024 Antioquia , Colombia On-Site Full Time About the position Linqia is looking for a Software Quality Assurance Engineer to develop and execute tests as well as automated tests in order to ensure product quality. As a Software QA Engineer you will estimate, plan, and coordinate testing activities. You will also ensure that quality...
-
Lead DevOps Engineer Specialist
hace 4 semanas
Colombia Gorilla Logic A tiempo completoJob OverviewGorilla Logic offers a unique opportunity for a Lead DevOps Engineer to join our team. As a key member of our Agile team, you will be responsible for driving the technical aspects of our DevOps practices.ResponsibilitiesTroubleshoot production issues and ensure site reliability through proactive monitoring and disaster recovery planning.Develop...
-
Network Infrastructure Engineer
hace 3 semanas
Colombia Two95 International Inc. A tiempo completoResponsibilities: Day-to-day administration, monitoring, and maintenance related to routers, switches, firewalls, load balancers, packet shapers, wireless systems, and circuits. Design, implement, and monitor network systems related to Cisco network routers, switches, firewalls, load balancers, WiFi systems, Circuits: WAN/MPLS, internet, replication,...