

Patton Labs Inc
SRE -Google Distributed Cloud Edge (GDCE) Based Restaurant Workloads
⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for an SRE specializing in Google Distributed Cloud Edge for restaurant workloads, offering a contract length of "unknown," with a pay rate of "unknown." Key skills include Google Edge, Kubernetes, Terraform, and observability tools.
🌎 - Country
United States
💱 - Currency
$ USD
-
💰 - Day rate
Unknown
-
🗓️ - Date
March 4, 2026
🕒 - Duration
Unknown
-
🏝️ - Location
Unknown
-
📄 - Contract
Unknown
-
🔒 - Security
Unknown
-
📍 - Location detailed
Chicago, IL
-
🧠 - Skills detailed
#Cloud #Terraform #Monitoring #Logging #Alation #Automation #Deployment #Kubernetes #Leadership #Prometheus #Observability #Scala
Role description
Site Reliability Engineering (SRE) Leadership
• Leads the SRE framework for GDCE based restaurant workloads, applying Google’s core principles around SLIs, SLOs, error budgets, and golden signals (latency, traffic, errors, saturation).
• Defines end to end reliability objectives for the GDC Connect platform, ensuring consistent behavior across 24,000+ geographically distributed restaurant nodes.
• Establishes runbooks, playbooks, and automated remediation workflows, reducing MTTR and ensuring consistent responses across global operations.
• Implements proactive failure detection using distributed monitoring patterns and Google SRE best practices, enabling early identification of degraded services before they impact restaurant operations.
Platform Reliability Engineering
• Architects a resilient GDCE platform capable of operating in low bandwidth or intermittent connectivity environments typical of QSR stores.
• Designs high availability clusters using Google Distributed Cloud Edge capabilities such as local control planes, fleet registration, and secure edge node lifecycle management.
• Sets up self healing infrastructure patterns using Kubernetes health checks, auto restarts, policy controls, and declarative GitOps-driven configuration.
• Enables versioned rollouts, canary deployments, blue green upgrades, ensuring zero downtime restaurant service continuity during releases.
Observability, Monitoring, and Alerting
(Aligned to Google Cloud Operations Suite)
• Integrates GDCE clusters with Cloud Logging, Cloud Monitoring, Managed Service for Prometheus, and Cloud Trace to establish full stack observability.
• Creates centralized alarms and event consolidation for store level services—covering GDC Connect modules, POS-facing services, ordering APIs, kiosk integration points, and network health.
• Defines multi-level alerting policies (store → region → global) to ensure the right stakeholders are notified with context-rich insights.
• Builds actionable dashboards and heatmaps for real-time fleet visibility and rollout readiness.
Operational Excellence & Platform Support
• Designs and operationalizes the Platform Support Model (L1.5/L2/L3) for GDCE-backed restaurant workloads.
• Establishes ticket triage workflows, escalation paths, incident swarming practices, and KPIs such as MTTA, MTTR, and platform uptime targets.
• Oversees the release certification pipeline—validating every store release through automated tests, conformance checks, resource baselines, and failure rollback mechanisms.
• Collaborates closely with McDonald's Global Operations, Google engineering groups, and Accenture to maintain a compliant, stable, and governed platform for all restaurant markets.
Automation & Tooling
• Drives automation using Terraform, Config Sync, GitOps pipelines, and Google-provided GDCE provisioning frameworks.
• Automates:
o Cluster provisioning
o Edge node onboarding
o Software rollout orchestration
o Store-level configuration sync
• Ensures fleet-wide consistency through declarative definitions and automated drift detection.
High scale Distributed Support Mindset
• Brings expertise in managing large, globally distributed footprints, ensuring:
o Zero-impact upgrades
o Predictable deployments
o Scalable edge support
o Efficient troubleshooting at store, region, and fleet levels
• Designs run at-scale diagnostic routines, remote recovery actions, and self-service operations tooling for store support teams."
What are the Mandatory skills and skill proficiencies required for this position?
strong hands‑on experience with Google Edge
Site Reliability Engineering (SRE) Leadership
• Leads the SRE framework for GDCE based restaurant workloads, applying Google’s core principles around SLIs, SLOs, error budgets, and golden signals (latency, traffic, errors, saturation).
• Defines end to end reliability objectives for the GDC Connect platform, ensuring consistent behavior across 24,000+ geographically distributed restaurant nodes.
• Establishes runbooks, playbooks, and automated remediation workflows, reducing MTTR and ensuring consistent responses across global operations.
• Implements proactive failure detection using distributed monitoring patterns and Google SRE best practices, enabling early identification of degraded services before they impact restaurant operations.
Platform Reliability Engineering
• Architects a resilient GDCE platform capable of operating in low bandwidth or intermittent connectivity environments typical of QSR stores.
• Designs high availability clusters using Google Distributed Cloud Edge capabilities such as local control planes, fleet registration, and secure edge node lifecycle management.
• Sets up self healing infrastructure patterns using Kubernetes health checks, auto restarts, policy controls, and declarative GitOps-driven configuration.
• Enables versioned rollouts, canary deployments, blue green upgrades, ensuring zero downtime restaurant service continuity during releases.
Observability, Monitoring, and Alerting
(Aligned to Google Cloud Operations Suite)
• Integrates GDCE clusters with Cloud Logging, Cloud Monitoring, Managed Service for Prometheus, and Cloud Trace to establish full stack observability.
• Creates centralized alarms and event consolidation for store level services—covering GDC Connect modules, POS-facing services, ordering APIs, kiosk integration points, and network health.
• Defines multi-level alerting policies (store → region → global) to ensure the right stakeholders are notified with context-rich insights.
• Builds actionable dashboards and heatmaps for real-time fleet visibility and rollout readiness.
Operational Excellence & Platform Support
• Designs and operationalizes the Platform Support Model (L1.5/L2/L3) for GDCE-backed restaurant workloads.
• Establishes ticket triage workflows, escalation paths, incident swarming practices, and KPIs such as MTTA, MTTR, and platform uptime targets.
• Oversees the release certification pipeline—validating every store release through automated tests, conformance checks, resource baselines, and failure rollback mechanisms.
• Collaborates closely with McDonald's Global Operations, Google engineering groups, and Accenture to maintain a compliant, stable, and governed platform for all restaurant markets.
Automation & Tooling
• Drives automation using Terraform, Config Sync, GitOps pipelines, and Google-provided GDCE provisioning frameworks.
• Automates:
o Cluster provisioning
o Edge node onboarding
o Software rollout orchestration
o Store-level configuration sync
• Ensures fleet-wide consistency through declarative definitions and automated drift detection.
High scale Distributed Support Mindset
• Brings expertise in managing large, globally distributed footprints, ensuring:
o Zero-impact upgrades
o Predictable deployments
o Scalable edge support
o Efficient troubleshooting at store, region, and fleet levels
• Designs run at-scale diagnostic routines, remote recovery actions, and self-service operations tooling for store support teams."
What are the Mandatory skills and skill proficiencies required for this position?
strong hands‑on experience with Google Edge






