Expedite Talent Solutions

Lead/Architect SRE Consultant -17+ Yrs

⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for a Lead/Architect SRE Consultant with 17+ years of experience. The contract length is unspecified, with a competitive pay rate. Key skills include compliance, Grafana, observability, logging, monitoring, and automation. Industry experience in SRE governance is essential.
🌎 - Country
United States
💱 - Currency
$ USD
-
💰 - Day rate
Unknown
-
🗓️ - Date
March 18, 2026
🕒 - Duration
Unknown
-
🏝️ - Location
Unknown
-
📄 - Contract
Unknown
-
🔒 - Security
Unknown
-
📍 - Location detailed
United States
-
🧠 - Skills detailed
#Monitoring #Logging #Alation #Grafana #Compliance #Observability #Scala #Automation
Role description
Key Responsibilities • Defining and implementing SLIs/SLOs and reliability targets that align with the departments Golden Pathways • Building and operationalizing observability standards (metrics, logs, traces) • Designing/evolving existing incident management and RCA practices • Driving automation and reliability engineering workflows • Establishing service health dashboards and telemetry pipelines • Working closely with engineering teams to embed reliability into development and operations Design and Build Central SRE Operating view Implement golden-pathway telemetry across: • App Performance Monitoring (APM) – Service response times, transaction bottlenecks • Logging & Tracing -correlated logs, structured tracing • Event & Alerting – actionable event definitions tied to severity • RCA/Tagging Compliance monitoring – auto tagging, and RCA lifecycle ingestion • Build executive level Scorecards and dashboards via Grafana and ServiceNow performance analytics: • Per-app reliability score • SRE maturity score • Mean time to detect/respond/restore (MTTx) • Escalation patterns and failure root trends Enable Long-Term SRE Governance • Establish SRE telemetry ingestion pipelines • Design alert logic for low-quality signals • Build RCA tagging enforcement playbooks • Deliver runbooks and telemetry integration guides per application type Centralized SRE Golden Dashboard – Single Pane of Glass • A central pillar of this initiative is the creation of a Centralized SRE Golden Dashboard serving as a Single Pane of Glass – for executive and operational visibility across all 40 + applications The dashboard will: • Aggregate key telemetry: reliability metrics, RCA themes, MTTR, incident volumes, tag compliance, alert noise, performance degradation, and resilience scoring. • Display per-app SRE health scores based on the maturity framework. Include dynamic drilldowns into: • Incident hygiene (tagging, closure quality, RCA ownership) • SLA/OLAs/SLIs/SLOd/Error budgets cleanly architected • Alerting trends and noise correlation • Capacity/resiliency warnings • Serve as the definitive executive reporting source – used for monthly reviews, CIO/VP visibility, and roadmap investment decisions.