Job Description
Site Reliability Engineer - Cloud Managed Services IT, London
Company
A cloud managed services business are building out their engineering team due to sustained growth of their customer base over recent years. The company continually serve millions of customer devices from 8 data centres globally.
Summary of role
As a site reliability engineer, you will be responsible for designing useful, scalable and secure monitoring systems with a focus on automation.
The SRE's are responsible for building and scaling the cloud on which millions of cusomters access their devices across the world with substantial growth over recent years, with over 4 billion HTTP requests per day.
The team automate cluster scaling for monitoring resources to be automatically deployed and use ElasticSearch clusters up to 1 petabyte of data for bespoke cases. Use of the ELK stack would be ideal (ElasticSearch, LogStash, Kibana)
You will need experience working in enterprise or cloud environments and scripting (Ruby, Scala, Python or Bash).
Experience working in a team that supports customer bases would be beneficial, particularly in an external-facing product environment.
Key Skills
- Experience designing and deploying enterprise or cloud envrionments
- Scripting experience (Python, Bash, Scala, Ruby)
- Focus on automation
- ELK stack (ElasticSearch, LogStash, Kibana)
- Docker, containers (kubernetes a bonus)
Next Step
If you are an experienced technologist with the relevant skillset please apply to this role.