Empleos actuales relacionados con Site Reliability Engineer - WorkFromHome - N-iX
-
Site Reliability Engineer — Remote, Kubernetes
hace 1 semana
WorkFromHome, Colombia BairesDev A tiempo completoA leading technology solutions provider is seeking a Site Reliability Engineer to support and administrate cloud project infrastructure. The role involves ensuring service availability and implementing CI/CD pipelines for automation. Candidates should have over 2 years of experience as an Infrastructure Engineer, familiarity with Kubernetes, and proficiency...
-
Site Reliability Engineer
hace 1 semana
WorkFromHome, Colombia BairesDev A tiempo completoAt BairesDev®, we've been leading the way in technology projects for over 15 years. We deliver cutting‑edge solutions to giants like Google and the most innovative startups in Silicon Valley. Our diverse 4,000+ team, composed of the world's Top 1% of tech talent, works remotely on roles that drive significant impact worldwide. When you apply for this...
-
Senior Engineering Manager, Site Reliability
hace 1 día
WorkFromHome, Colombia Next League A tiempo completoSenior Engineering Manager, Site Reliability Join to apply for the Senior Engineering Manager, Site Reliability role at Next League As the Senior Manager of Site Reliability Engineering, you will be responsible for ensuring the reliability, scalability, and efficiency for a wide range of client systems, including organizations such as NASCAR, USOPC, and TGL....
-
Site Reliability
hace 1 día
WorkFromHome, Colombia Canonical A tiempo completoSite Reliability / GitOps Engineer Canonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. Our platform, Ubuntu, is very widely used in breakthrough enterprise initiatives such as public cloud, data science, AI, engineering innovation, and IoT. Our customers include the world's leading...
-
Reliability Engineer
hace 1 semana
WorkFromHome, Colombia Graymont A tiempo completoReliability Engineer Full‑Time, Permanent Any province in which Graymont has operations Graymont is seeking a Reliability Engineer to join our team and provide guidance and support to our network of facilities across North America to ensure the highest levels of performance and reliability for our plant equipment. Reporting to the Maintenance Program...
-
Platform Reliability Engineer — Remote
hace 1 semana
WorkFromHome, Colombia Xebia A tiempo completoA global technology consultancy is seeking a skilled Platform Engineer in Colombia to ensure reliability and performance of distributed systems. This remote role involves leading incident responses, designing reliability practices, and enhancing observability capabilities. Candidates should have over 5 years of relevant experience and strong communication...
-
Plant Reliability Engineer
hace 1 semana
WorkFromHome, Colombia Graymont A tiempo completoA leader in calcium-based solutions is seeking a Reliability Engineer to enhance equipment performance across facilities in North America. Responsibilities include conducting critical evaluations and developing preventive maintenance programs while ensuring safety standards. Candidates should possess a Bachelor's in engineering, strong computer skills, and a...
-
Remote Principal Python Engineer: Scale
hace 1 día
WorkFromHome, Colombia Medium A tiempo completoA leading tech company is seeking an engineer to develop a scalable data architecture. The ideal candidate will ensure reliability, support rapid growth, and enhance productivity through innovative solutions. This is a fully remote role, allowing candidates from anywhere to apply. Responsibilities include optimizing databases and managing system performance...
-
Senior SRE Manager: Lead Reliability
hace 1 día
WorkFromHome, Colombia Next League A tiempo completoA leading tech company is seeking a Senior Engineering Manager for Site Reliability. This remote position involves leading a team of five engineers while ensuring system reliability and performance for clients. Candidates must have extensive experience in SRE, strong leadership capabilities, and technical expertise in cloud services. The role offers...
-
Remote SRE
hace 1 día
WorkFromHome, Colombia Canonical A tiempo completoA leading open-source software company is hiring a Site Reliability / GitOps Engineer to join their Information Systems team. This remote role focuses on automation and requires experience in IT operations, Infrastructure as Code, and Linux. The successful candidate will drive operations automation and support Canonical's IT production services used by...
Site Reliability Engineer
hace 2 horas
N-iX Bogota, D.C., Capital District, Colombia Overview Site Reliability Engineer (SRE) to help monitor, maintain, and scale software production environments, with a primary focus on onboarding new microservices. Work closely with development and platform teams to automate and program-managed onboarding lifecycle—from requirements and environment setup through deployment, testing, documentation, and handover—ensuring reliability, scalability, performance, and compliance at every step. Responsibilities Lead and support the end-to-end onboarding process for new microservices into production environments. Identify and automate gaps in the current onboarding workflow (deployment, configuration, monitoring, scaling, etc.). Provide program management for onboarding activities, including timelines, dependencies, and stakeholder communication. Collaborate with development and operations/platform teams to ensure smooth and consistent rollout of new services. Design and implement monitoring, logging, and alerting for all onboarded services. Ensure comprehensive metrics collection (e.g., availability, latency, error rates, throughput) to support SLOs/SLIs. Tune alerts to minimize noise while ensuring rapid detection and response to production issues. Perform load and stress testing to validate that services can scale to meet current and projected demand. Implement and refine auto-scaling mechanisms and capacity planning practices. Conduct ongoing performance tuning and optimization to achieve minimal latency and high throughput. Drive high service reliability and uptime for all onboarded microservices. Help teams design and implement fault-tolerant architectures, including failover and redundancy mechanisms. Work with teams to adopt SRE best practices (e.g., error budgets, post-incident reviews, runbooks). Ensure all onboarded services meet security and compliance requirements. Integrate security best practices into deployment, monitoring, and operational processes. Maintain audit trails and documentation for onboarding activities to support regulatory and internal compliance. Create detailed documentation for the service onboarding process, including standards, patterns, and templates. Develop and maintain runbooks, playbooks, and SOPs for ongoing operations. Conduct training sessions and workshops for internal teams to enable self-service onboarding and long-term maintainability. Participate in requirements analysis for new services; define onboarding success criteria and KPIs. Develop onboarding plans outlining steps, timelines, responsibilities, and acceptance criteria; present plans to stakeholders for review and approval. Prepare and validate environments, ensuring appropriate access, permissions, and tooling are in place. Conduct comprehensive functional, performance, reliability, and security testing prior to go-live. Provide post-onboarding support, monitoring services to ensure continued reliability and quickly addressing any issues that arise. Required Qualifications Proven experience as a Site Reliability Engineer, DevOps Engineer, or similar role in microservices-based environments. Strong understanding of microservices architecture, distributed systems, and cloud-native concepts. Hands-on experience with: Production monitoring, logging, and alerting (e.g., metrics, tracing, log aggregation tools). Automation of deployment and operational workflows (e.g., scripts, pipelines, IaC, or similar). Load/performance testing and capacity planning. Demonstrated ability to improve service reliability, scalability, and performance in production. Familiarity with security best practices related to service deployment, monitoring, and operations. Experience working across cross-functional teams (development, operations, security, compliance) to deliver complex initiatives. Excellent documentation, communication, and stakeholder management skills. Preferred Qualifications Experience defining and tracking SRE KPIs/SLOs/SLIs for onboarding and production services. Background in program or project management of technical initiatives (especially service onboarding or platform rollouts). Prior experience in high-availability, regulated, or large-scale SaaS environments. We offer Flexible working format - remote, office-based or flexible A competitive salary and good compensation package Personalized career growth Professional development tools (mentorship program, tech talks and trainings, centers of excellence, and more) Active tech communities with regular knowledge sharing Education reimbursement Memorable anniversary presents Corporate events and team buildings Other location-specific benefits Not applicable for freelancers #J-18808-Ljbffr