Site Reliability Engineer (DevOps)

London, England, United Kingdom · Engineering expand job description ↓

Description

About Signal

Signal has built a world-class artificial intelligence platform that turns the problem of information overload on its head! It ingests millions of articles and posts a day from millions of sources, from news media to financial regulation. Using machine learning techniques it has become highly effective at stripping out the information within this content and aggregating it in a single platform. Users can monitor key people, places, events, companies and topics in real-time as well as track changes and measure impact over time, across more content than anyone could ever hope to read! How awesome is that!

It gets faster and more accurate the more data it consumes. With more data points to work with it can show past, emerging and future trends in real-time. It picks up on nuances and combines them to highlight the next big risk, opportunity or crisis that might affect organisations. It makes the unknown, known.

You will be part of Signal’s infrastructure team.

What will you be doing?

Responsibilities:

  • Proactively monitor and enhance performance and reliability of our platform on a full stack level (from the application code to cloud infrastructure)
  • Help to troubleshoot production issues
  • Support the development team in tuning applications, runtimes, containers and systems
  • Write code to implement tools, automate processes and to enhance monitoring
  • Work with our development team for support and guidance on building deployment pipelines, increasing automation and security of our systems
  • Architect and implement the next generation of our cloud infrastructure platform to support ongoing growth
  • Shape and promote SRE and DevOps practises and processes to support a growing team


Our Stack:

  • Microservices: mostly written in Clojure, Python and Javascript and run as Docker containers on AWS.
  • AWS: EC2, ECS, Lambda, SQS, Kinesis, RDS (Postgres) and Elasticache (Redis)
  • Elasticsearch: we operate and continuously enhance a large cluster at the core of our product
  • Infrastructure automation: Terraform, Ansible, Packer and Python
  • Monitoring and logging: Prometheus, Grafana, Fluentd and CloudWatch


You will be:

  • Friendly, approachable and able to collaborate with both technical and non-technical colleagues
  • Curious and eager to learn
  • You strive for simple (but not simplistic!) solutions

Requirements

Your skills:

  • Experience in one of the following: Python, Ruby, Go, Java or JavaScript
  • Experience with Linux system administration, networks, internet protocols (e.g. HTTP) and databases
  • Automation skills (with existing tools or your own)
  • You can work with a high degree of autonomy: you know how to manage your time effectively
  • You are friendly, approachable and able to collaborate with both technical and non-technical colleagues


Ideally you also have experience with:

  • Terraform
  • Prometheus and Grafana or similar
  • Elasticsearch or other distributed databases
  • Performance tuning (e.g. JVM, Node.js, Python)

Benefits

  • Competitive salary
  • Flexible working
  • Pension plan
  • Unlimited holiday entitlement
  • Company MacBook & Apple equipment
  • Monthly team lunch
Personal information
Your Profile
Application Details