Searchabilityยฎ

Engineering Lead

โญ - Featured Role | Apply direct with Data Freelance Hub
This role is for an Engineering Lead focused on observability and operational resilience, offering a 6-month contract with a pay rate of "unknown." Requires expertise in Prometheus and Grafana, along with 6+ years in SRE or DevOps within enterprise environments.
๐ŸŒŽ - Country
United Kingdom
๐Ÿ’ฑ - Currency
ยฃ GBP
-
๐Ÿ’ฐ - Day rate
Unknown
-
๐Ÿ—“๏ธ - Date
February 12, 2026
๐Ÿ•’ - Duration
Unknown
-
๐Ÿ๏ธ - Location
Hybrid
-
๐Ÿ“„ - Contract
Unknown
-
๐Ÿ”’ - Security
Unknown
-
๐Ÿ“ - Location detailed
London
-
๐Ÿง  - Skills detailed
#Data Ingestion #Deployment #Strategy #Grafana #Observability #Documentation #Logging #API (Application Programming Interface) #Leadership #DevOps #Monitoring #Automation #Prometheus
Role description
NEW CONTRACT ROLE - ENGINEERING LEAD (OBSERVABILITY / SRE) | ASAP START | UK (Remote / Hybrid) | 6-Month Contract | Possible Extension | London, Manchester, Birmingham or Edinburgh THE OPPORTUNITY We're looking for an experienced Engineering Lead to support a critical enterprise observability and operational resilience programme. This role is focused on leading the uplift of monitoring, alerting, and end-to-end service visibility across business-critical applications. It's ideal for a senior, hands-on engineering lead with deep Prometheus and Grafana expertise, capable of guiding best practices across SRE, platform, and application teams. THE ROLE โ€ข Lead collaboration with Application Stewards and Site Reliability Engineers (SREs) to confirm critical services and assets in scope for monitoring verification and uplift โ€ข Work with EMAS to analyse Prometheus scrape coverage, exporter deployment, and Grafana dashboard availability for critical applications โ€ข Drive improvements across monitoring configuration, alert quality, metrics, dashboards, KPIs, SLIs, and SLOs โ€ข Lead the optimisation of alerting to ensure alerts are reliable, actionable, and noise-optimised, applying Alertmanager best practices โ€ข Oversee delivery of automated end-to-end business flow visibility through Grafana service maps, dependency visualisation, and topology integrations โ€ข Review observability roles and responsibilities and recommend improvements aligned to Operational Resilience standards โ€ข Champion automation and API-driven approaches for dashboard provisioning, alert management, and data ingestion โ€ข Ensure clear documentation of standards, configurations, and improvements delivered TECHNICAL SKILLS / REQUIREMENTS Strong hands-on and leadership experience with: Prometheus - instrumentation strategy, exporters, service discovery, custom metrics, PromQL, recording rules, alerting rules, HA architectures (Thanos, Cortex, Mimir) Grafana - dashboard and panel design, alerting and routing, synthetic monitoring, Loki, real user monitoring (e.g. Grafana Faro) Observability Ecosystem - integration of metrics, logs, and traces (Loki, Tempo, OpenTelemetry), APIs and automation PROFILE โ€ข Proven experience as a Senior Engineer, Technical Lead, or Engineering Lead within SRE, Observability, DevOps, or Platform Engineering โ€ข Comfortable leading technical direction while remaining hands-on โ€ข Strong stakeholder engagement and communication skills โ€ข Experience operating in complex, enterprise-scale or regulated environments โ€ข Typically 6+ years' experience in reliability engineering, monitoring, or observability-focused roles KEYWORDS Engineering Lead, Observability Engineering, Site Reliability Engineering, SRE, Prometheus, Grafana, Alertmanager, PromQL, Monitoring, Operational Resilience, DevOps, Platform Engineering, Metrics, Logging, Tracing, OpenTelemetry, Loki, Tempo, Thanos, Cortex, Mimir