IntePros

Principal Azure Capacity & Resiliency Engineer

⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for a Principal Azure Capacity & Resiliency Engineer, offering a contract length of "unknown" at a pay rate of "unknown." Key skills include Azure expertise, capacity modeling, and financial stewardship, with required experience in FedRAMP High environments.
🌎 - Country
United States
πŸ’± - Currency
$ USD
-
πŸ’° - Day rate
800
-
πŸ—“οΈ - Date
April 15, 2026
πŸ•’ - Duration
Unknown
-
🏝️ - Location
Unknown
-
πŸ“„ - Contract
Unknown
-
πŸ”’ - Security
Unknown
-
πŸ“ - Location detailed
New York City Metropolitan Area
-
🧠 - Skills detailed
#Monitoring #Azure #Vault #Disaster Recovery #Security #Storage #Ansible #Trend Analysis #Databases #AutoScaling #Cloud #Strategy #Capacity Management #Prometheus #Deployment #Grafana #Azure SQL #Forecasting #Containers #Infrastructure as Code (IaC) #Compliance #SQL (Structured Query Language) #Firewalls #Terraform
Role description
Principal Azure Capacity & Resiliency Engineer We are seeking a Principal Azure Capacity & Resiliency Engineer to lead capacity planning, financial stewardship, and resilience engineering for a high-visibility government cloud initiative operating under FedRAMP High regulatory requirements. This role is dedicated to a mission-critical Azure environment and reports directly to the Capacity Management Lead. Architecture standards are defined by a separate architecture team; this position owns the modeling, governance, demand management, and optimization of cloud capacity aligned to those standards. Role Overview This is a principal-level engineering role responsible for the end-to-end cloud capacity operating model across compute, storage, containers, networking, and platform services within Azure. The role is primarily focused on modeling, oversight, governance, and optimization. While capable of hands-on configuration when required, this is not a pure architecture or build position. You will partner closely with SRE, Security, Architecture, Governance, and Finance teams to ensure: β€’ SLO alignment and performance integrity β€’ Validated disaster recovery capacity β€’ Engineered buffer standards β€’ U.S. and U.S. Territories–only processing and storage compliance β€’ Financial optimization aligned to demand forecasts What You’ll Own Cloud Capacity Strategy & Engineering β€’ Lead the end-to-end capacity lifecycle: modeling, forecasting, monitoring, tuning, and governance β€’ Build and maintain service-level capacity models across compute, AKS, databases, storage, networking, and cryptographic services β€’ Establish and enforce engineered buffer standards (N+1/N+2, headroom %, runway weeks) aligned to service criticality β€’ Design autoscaling guardrails aligned to SLI/SLO performance metrics β€’ Translate telemetry and trend analysis into scaling, performance, and cost decisions β€’ Partner with product and engineering teams to onboard and offboard workloads with clear demand modeling and cost accountability Resiliency & Disaster Recovery β€’ Validate hot/warm/cold failover capacity without compromising steady-state performance β€’ Conduct criticality analysis to prioritize high-impact services β€’ Reserve quota and buffer for disaster and maintenance scenarios β€’ Align resiliency strategy to FedRAMP High expectations and government regulatory standards Financial Stewardship & Demand Management β€’ Own reservation strategy modeling and commitment planning across Azure services β€’ Balance resilience, performance, and cost using Reservations, Savings Plans, and rightsizing strategies β€’ Forecast demand based on product roadmaps and business growth β€’ Drive financial stewardship through optimized commitment-based purchasing β€’ Deliver executive-ready dashboards including utilization, saturation, headroom, scaling efficiency, DR readiness, and cost-to-performance metrics Compliance & Governance β€’ Contribute to audit evidence and regulatory artifacts (SSP updates, control narratives, POA&M tracking, continuous monitoring artifacts) β€’ Enforce U.S. and U.S. Territories–only processing, storage, backup, and disaster recovery requirements β€’ Participate in Change Advisory Boards (CAB) as the capacity and resiliency representative β€’ Maintain configuration governance discipline across Azure services Required Experience β€’ 10–12+ years in infrastructure capacity planning and performance engineering β€’ Expert-level Azure knowledge β€’ Demonstrated capacity modeling and forecasting expertise β€’ Strong experience with AKS scaling, throughput, and performance optimization β€’ Deep understanding of Azure cost structures, reservations, and financial modeling β€’ Hands-on experience operating in FedRAMP High or similarly regulated government environments (required) β€’ Strong understanding of resiliency patterns and disaster recovery options β€’ Familiarity with cloud security controls, firewalls, and compliance constraints β€’ Experience working with Infrastructure as Code and CI/CD gated deployments β€’ Strong problem-solving ability and executive communication skills Preferred Experience β€’ Financial services background β€’ Government program experience β€’ Hands-on experience with Terraform or Bicep β€’ Knowledge of Ansible β€’ Working knowledge of Azure networking (VNets, routing, firewall flows, private endpoints) β€’ Experience with cloud deployment methodologies Technical Environment β€’ Azure Monitor, Log Analytics, Metrics, Advisor, Cost Management β€’ Reservations, Savings Plans, Quota Management β€’ AKS, VMSS, App Service autoscaling (HPA/VPA) β€’ Prometheus, Grafana β€’ Azure SQL / Managed Instances β€’ Azure Firewall, NSGs, Private Endpoints, Bastion β€’ Azure Key Vault / Managed HSM β€’ Terraform, Bicep, ARM β€’ CI/CD gated deployments β€’ Azure Policy and governance initiatives