Job

Job

Senior Site Reliability Engineer (SRE)

Job type:Contract
Town/City:London
County:Remote
Salary/Rate:inside ir35
Business Sector:IT
Job ref:CDI - 154112
Post Date:March 17, 2026

Senior Site Reliability Engineer (SRE)
Remote

12-month contract (high chance of extension)

Job Description
Join a global pioneer in the video game industry and own the reliability of high-traffic, revenue-critical platforms used by millions worldwide. As a Senior SRE, you'll shape the architecture, improve platform-wide resiliency, and ensure services stay performant, scalable, and secure. This isn't just about maintaining a single system, you'll influence reliability across multiple services, driving improvements that touch the entire ecosystem.

Key Responsibilities

  • Lead incident response and troubleshooting for production systems, resolving high-severity issues and driving post-incident improvements.
  • Influence architecture to improve platform-wide reliability, resiliency, and operational efficiency, ensuring services remain available under heavy load.
  • Drive containerisation best practices and manage Kubernetes-based workloads at scale.
  • Build and maintain event-driven architectures that scale globally while ensuring fault-tolerance and high availability.
  • Automate infrastructure provisioning, deployment, and monitoring using Infrastructure as Code (Terraform, CloudFormation, Ansible, CDK).
  • Collaborate with engineering, product, and security teams to define SLOs, SLIs, and error budgets across services.
  • Provide mentorship, advocate SRE best practices, and ensure teams are empowered to deliver resilient, reliable systems.

Experience / Must-Have Skills

  • Extensive experience in AWS and AWS-managed services (EC2, Lambda, S3, VPC, CloudWatch, CloudTrail, IAM, EKS, Service Catalog, multi-account environments).
  • Strong Kubernetes / container orchestration experience, including EKS, OpenShift, Docker, and service mesh.
  • Deep understanding of networking fundamentals: DNS, VPCs, routing, load balancing, TCP/IP, firewall policies.
  • Proven track record in incident response and troubleshooting at scale.
  • Hands-on experience with infrastructure automation and CI/CD pipelines.
  • Experience designing event-driven architectures and resilient systems.
  • High level of autonomy, able to influence platform-wide decisions and architect for reliability across services.
  • Ability and desire to mentor junior staff
  • Bonus: experience in gaming, interactive entertainment, or other high-traffic, global-scale platforms.

If you are interested in this role, please feel free to submit your CV.