Filter

Back to Jobs

Site Reliability Engineer

Sheffield | Hybrid

Find more like this

£50,000 - £60,000

Find more like this

Permanent

Find more like this

Software Architecture & Development

Find more like this

Apply Now

For full details, email Daniel Koseoglu: daniel@affecto.co.uk or call 0114 399 3699.

Job Ref: 1120873

What’s it all about?

We’re helping an international software and Gen AI business that’s not just growing—it’s thriving! They’ve built a platform handling over 25 billion events daily and are looking for an experienced Site Reliability Engineer to join the journey.

The Site Reliability Engineering Team here is responsible for provisioning and maintaining the cloud infrastructure from development through production and working with the wider Engineering and Product Teams to ensure the product suite is not only reliable but also cost-effiecient. The platform is built on Kubernetes Engine and leverages several other Google technologies such as Memorystore, Cloud Datastore, PubSub, BigQuery and Vertex AI, as well as services from other vendors such as Amazon SES.

You’ll work in an environment where your ideas aren’t just heard—they’re implemented! From shaping the infrastructure to collaborating across teams, you’ll play a pivotal role in keeping things running smoothly while innovating for what’s next.

What you’ll be doing?

Automate everything: You’ll build and manage infrastructure with tools like Terraform and Ansible, ensuring systems are scalable, secure, and efficient
Solve real problems: Debug production issues, fix them fast, and put measures in place to prevent them
Build smarter systems: Design and monitor new services alongside developers, ensuring SLIs and SLOs align with performance and reliability goals
Collaborate across teams: Work with developers, product managers, and security to keep the platform cutting-edge and compliant
Stay ahead of the curve: Proactively manage capacity and plan for growth

What you bring?

Cloud expertise: You know your way around cloud infrastructure, with hands-on experience using Terraform, Ansible, or similar tools
Coding chops: Proficiency in Python, Go, or a similar language—and the willingness to pick up new ones as needed
Systems thinker: You understand how systems fail, where bottlenecks hide, and how to design for resilience
Great communicator: You can translate technical details into clear, actionable documentation for your team
Metrics-driven mindset: You can talk performance, cost analysis, and operational metrics like a pro

Why we're excited

Scale: Be part of a team powering a platform processing over 25 billion events a day—and growing fast
Progression opportunities: Whether you want to deepen your technical skills or step into leadership, the path is yours to shape
Broad exposure: Work with cutting-edge tech like Kubernetes, BigQuery, Vertex AI, and more
Global impact: What you build here matters—your work will directly influence systems that countless businesses across the world depend on
Dynamic environment: Forget the corporate grind—this is a place where ideas flow, growth happens, and your work truly makes a difference

Why this role is different?

This isn’t just about keeping the lights on. You’ll be at the heart of a team that doesn’t settle for “good enough.” From scaling the infrastructure to improving processes and mentoring others, you’ll shape the future of a rapidly growing platform—and your own career along the way.

Working Policy?

Hybrid - think c3 days a week in the office in Sheffield.

Apply Now

For full details, email Daniel Koseoglu: daniel@affecto.co.uk or call 0114 399 3699.