Job
Senior Site Reliability Engineer (SRE)
Job type:Contract
Town/City:London
County:Remote
Salary/Rate:inside ir35
Business Sector:IT
Job ref:CDI - 154112
Post Date:March 17, 2026
Senior Site Reliability Engineer (SRE)
Remote
12-month contract (high chance of extension)
Job Description
Join a global pioneer in the video game industry and own the reliability of high-traffic, revenue-critical platforms used by millions worldwide. As a Senior SRE, you'll shape the architecture, improve platform-wide resiliency, and ensure services stay performant, scalable, and secure. This isn't just about maintaining a single system, you'll influence reliability across multiple services, driving improvements that touch the entire ecosystem.
Key Responsibilities
- Lead incident response and troubleshooting for production systems, resolving high-severity issues and driving post-incident improvements.
- Influence architecture to improve platform-wide reliability, resiliency, and operational efficiency, ensuring services remain available under heavy load.
- Drive containerisation best practices and manage Kubernetes-based workloads at scale.
- Build and maintain event-driven architectures that scale globally while ensuring fault-tolerance and high availability.
- Automate infrastructure provisioning, deployment, and monitoring using Infrastructure as Code (Terraform, CloudFormation, Ansible, CDK).
- Collaborate with engineering, product, and security teams to define SLOs, SLIs, and error budgets across services.
- Provide mentorship, advocate SRE best practices, and ensure teams are empowered to deliver resilient, reliable systems.
Experience / Must-Have Skills
- Extensive experience in AWS and AWS-managed services (EC2, Lambda, S3, VPC, CloudWatch, CloudTrail, IAM, EKS, Service Catalog, multi-account environments).
- Strong Kubernetes / container orchestration experience, including EKS, OpenShift, Docker, and service mesh.
- Deep understanding of networking fundamentals: DNS, VPCs, routing, load balancing, TCP/IP, firewall policies.
- Proven track record in incident response and troubleshooting at scale.
- Hands-on experience with infrastructure automation and CI/CD pipelines.
- Experience designing event-driven architectures and resilient systems.
- High level of autonomy, able to influence platform-wide decisions and architect for reliability across services.
- Ability and desire to mentor junior staff
- Bonus: experience in gaming, interactive entertainment, or other high-traffic, global-scale platforms.
If you are interested in this role, please feel free to submit your CV.

