

Expedite Talent Solutions
Lead/Architect SRE Consultant -17+ Yrs
⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for a Lead/Architect SRE Consultant with 17+ years of experience. The contract length is unspecified, with a competitive pay rate. Key skills include compliance, Grafana, observability, logging, monitoring, and automation. Industry experience in SRE governance is essential.
🌎 - Country
United States
💱 - Currency
$ USD
-
💰 - Day rate
Unknown
-
🗓️ - Date
March 18, 2026
🕒 - Duration
Unknown
-
🏝️ - Location
Unknown
-
📄 - Contract
Unknown
-
🔒 - Security
Unknown
-
📍 - Location detailed
United States
-
🧠 - Skills detailed
#Monitoring #Logging #Alation #Grafana #Compliance #Observability #Scala #Automation
Role description
Key Responsibilities
• Defining and implementing SLIs/SLOs and reliability targets that align with the departments Golden Pathways
• Building and operationalizing observability standards (metrics, logs, traces)
• Designing/evolving existing incident management and RCA practices
• Driving automation and reliability engineering workflows
• Establishing service health dashboards and telemetry pipelines
• Working closely with engineering teams to embed reliability into development and operations
Design and Build Central SRE Operating view
Implement golden-pathway telemetry across:
• App Performance Monitoring (APM) – Service response times, transaction bottlenecks
• Logging & Tracing -correlated logs, structured tracing
• Event & Alerting – actionable event definitions tied to severity
• RCA/Tagging Compliance monitoring – auto tagging, and RCA lifecycle ingestion
• Build executive level Scorecards and dashboards via Grafana and ServiceNow performance analytics:
• Per-app reliability score
• SRE maturity score
• Mean time to detect/respond/restore (MTTx)
• Escalation patterns and failure root trends
Enable Long-Term SRE Governance
• Establish SRE telemetry ingestion pipelines
• Design alert logic for low-quality signals
• Build RCA tagging enforcement playbooks
• Deliver runbooks and telemetry integration guides per application type
Centralized SRE Golden Dashboard – Single Pane of Glass
• A central pillar of this initiative is the creation of a Centralized SRE Golden Dashboard serving as a Single Pane of Glass – for executive and operational visibility across all 40 + applications
The dashboard will:
• Aggregate key telemetry: reliability metrics, RCA themes, MTTR, incident volumes, tag compliance, alert noise, performance degradation, and resilience scoring.
• Display per-app SRE health scores based on the maturity framework.
Include dynamic drilldowns into:
• Incident hygiene (tagging, closure quality, RCA ownership)
• SLA/OLAs/SLIs/SLOd/Error budgets cleanly architected
• Alerting trends and noise correlation
• Capacity/resiliency warnings
• Serve as the definitive executive reporting source – used for monthly reviews, CIO/VP visibility, and roadmap investment decisions.
Key Responsibilities
• Defining and implementing SLIs/SLOs and reliability targets that align with the departments Golden Pathways
• Building and operationalizing observability standards (metrics, logs, traces)
• Designing/evolving existing incident management and RCA practices
• Driving automation and reliability engineering workflows
• Establishing service health dashboards and telemetry pipelines
• Working closely with engineering teams to embed reliability into development and operations
Design and Build Central SRE Operating view
Implement golden-pathway telemetry across:
• App Performance Monitoring (APM) – Service response times, transaction bottlenecks
• Logging & Tracing -correlated logs, structured tracing
• Event & Alerting – actionable event definitions tied to severity
• RCA/Tagging Compliance monitoring – auto tagging, and RCA lifecycle ingestion
• Build executive level Scorecards and dashboards via Grafana and ServiceNow performance analytics:
• Per-app reliability score
• SRE maturity score
• Mean time to detect/respond/restore (MTTx)
• Escalation patterns and failure root trends
Enable Long-Term SRE Governance
• Establish SRE telemetry ingestion pipelines
• Design alert logic for low-quality signals
• Build RCA tagging enforcement playbooks
• Deliver runbooks and telemetry integration guides per application type
Centralized SRE Golden Dashboard – Single Pane of Glass
• A central pillar of this initiative is the creation of a Centralized SRE Golden Dashboard serving as a Single Pane of Glass – for executive and operational visibility across all 40 + applications
The dashboard will:
• Aggregate key telemetry: reliability metrics, RCA themes, MTTR, incident volumes, tag compliance, alert noise, performance degradation, and resilience scoring.
• Display per-app SRE health scores based on the maturity framework.
Include dynamic drilldowns into:
• Incident hygiene (tagging, closure quality, RCA ownership)
• SLA/OLAs/SLIs/SLOd/Error budgets cleanly architected
• Alerting trends and noise correlation
• Capacity/resiliency warnings
• Serve as the definitive executive reporting source – used for monthly reviews, CIO/VP visibility, and roadmap investment decisions.






