

IntePros
Principal Azure Capacity & Resiliency Engineer
β - Featured Role | Apply direct with Data Freelance Hub
This role is for a Principal Azure Capacity & Resiliency Engineer, offering a contract length of "unknown" at a pay rate of "unknown." Key skills include Azure expertise, capacity modeling, and financial stewardship, with required experience in FedRAMP High environments.
π - Country
United States
π± - Currency
$ USD
-
π° - Day rate
800
-
ποΈ - Date
April 15, 2026
π - Duration
Unknown
-
ποΈ - Location
Unknown
-
π - Contract
Unknown
-
π - Security
Unknown
-
π - Location detailed
New York City Metropolitan Area
-
π§ - Skills detailed
#Monitoring #Azure #Vault #Disaster Recovery #Security #Storage #Ansible #Trend Analysis #Databases #AutoScaling #Cloud #Strategy #Capacity Management #Prometheus #Deployment #Grafana #Azure SQL #Forecasting #Containers #Infrastructure as Code (IaC) #Compliance #SQL (Structured Query Language) #Firewalls #Terraform
Role description
Principal Azure Capacity & Resiliency Engineer
We are seeking a Principal Azure Capacity & Resiliency Engineer to lead capacity planning, financial stewardship, and resilience engineering for a high-visibility government cloud initiative operating under FedRAMP High regulatory requirements.
This role is dedicated to a mission-critical Azure environment and reports directly to the Capacity Management Lead. Architecture standards are defined by a separate architecture team; this position owns the modeling, governance, demand management, and optimization of cloud capacity aligned to those standards.
Role Overview
This is a principal-level engineering role responsible for the end-to-end cloud capacity operating model across compute, storage, containers, networking, and platform services within Azure.
The role is primarily focused on modeling, oversight, governance, and optimization. While capable of hands-on configuration when required, this is not a pure architecture or build position.
You will partner closely with SRE, Security, Architecture, Governance, and Finance teams to ensure:
β’ SLO alignment and performance integrity
β’ Validated disaster recovery capacity
β’ Engineered buffer standards
β’ U.S. and U.S. Territoriesβonly processing and storage compliance
β’ Financial optimization aligned to demand forecasts
What Youβll Own
Cloud Capacity Strategy & Engineering
β’ Lead the end-to-end capacity lifecycle: modeling, forecasting, monitoring, tuning, and governance
β’ Build and maintain service-level capacity models across compute, AKS, databases, storage, networking, and cryptographic services
β’ Establish and enforce engineered buffer standards (N+1/N+2, headroom %, runway weeks) aligned to service criticality
β’ Design autoscaling guardrails aligned to SLI/SLO performance metrics
β’ Translate telemetry and trend analysis into scaling, performance, and cost decisions
β’ Partner with product and engineering teams to onboard and offboard workloads with clear demand modeling and cost accountability
Resiliency & Disaster Recovery
β’ Validate hot/warm/cold failover capacity without compromising steady-state performance
β’ Conduct criticality analysis to prioritize high-impact services
β’ Reserve quota and buffer for disaster and maintenance scenarios
β’ Align resiliency strategy to FedRAMP High expectations and government regulatory standards
Financial Stewardship & Demand Management
β’ Own reservation strategy modeling and commitment planning across Azure services
β’ Balance resilience, performance, and cost using Reservations, Savings Plans, and rightsizing strategies
β’ Forecast demand based on product roadmaps and business growth
β’ Drive financial stewardship through optimized commitment-based purchasing
β’ Deliver executive-ready dashboards including utilization, saturation, headroom, scaling efficiency, DR readiness, and cost-to-performance metrics
Compliance & Governance
β’ Contribute to audit evidence and regulatory artifacts (SSP updates, control narratives, POA&M tracking, continuous monitoring artifacts)
β’ Enforce U.S. and U.S. Territoriesβonly processing, storage, backup, and disaster recovery requirements
β’ Participate in Change Advisory Boards (CAB) as the capacity and resiliency representative
β’ Maintain configuration governance discipline across Azure services
Required Experience
β’ 10β12+ years in infrastructure capacity planning and performance engineering
β’ Expert-level Azure knowledge
β’ Demonstrated capacity modeling and forecasting expertise
β’ Strong experience with AKS scaling, throughput, and performance optimization
β’ Deep understanding of Azure cost structures, reservations, and financial modeling
β’ Hands-on experience operating in FedRAMP High or similarly regulated government environments (required)
β’ Strong understanding of resiliency patterns and disaster recovery options
β’ Familiarity with cloud security controls, firewalls, and compliance constraints
β’ Experience working with Infrastructure as Code and CI/CD gated deployments
β’ Strong problem-solving ability and executive communication skills
Preferred Experience
β’ Financial services background
β’ Government program experience
β’ Hands-on experience with Terraform or Bicep
β’ Knowledge of Ansible
β’ Working knowledge of Azure networking (VNets, routing, firewall flows, private endpoints)
β’ Experience with cloud deployment methodologies
Technical Environment
β’ Azure Monitor, Log Analytics, Metrics, Advisor, Cost Management
β’ Reservations, Savings Plans, Quota Management
β’ AKS, VMSS, App Service autoscaling (HPA/VPA)
β’ Prometheus, Grafana
β’ Azure SQL / Managed Instances
β’ Azure Firewall, NSGs, Private Endpoints, Bastion
β’ Azure Key Vault / Managed HSM
β’ Terraform, Bicep, ARM
β’ CI/CD gated deployments
β’ Azure Policy and governance initiatives
Principal Azure Capacity & Resiliency Engineer
We are seeking a Principal Azure Capacity & Resiliency Engineer to lead capacity planning, financial stewardship, and resilience engineering for a high-visibility government cloud initiative operating under FedRAMP High regulatory requirements.
This role is dedicated to a mission-critical Azure environment and reports directly to the Capacity Management Lead. Architecture standards are defined by a separate architecture team; this position owns the modeling, governance, demand management, and optimization of cloud capacity aligned to those standards.
Role Overview
This is a principal-level engineering role responsible for the end-to-end cloud capacity operating model across compute, storage, containers, networking, and platform services within Azure.
The role is primarily focused on modeling, oversight, governance, and optimization. While capable of hands-on configuration when required, this is not a pure architecture or build position.
You will partner closely with SRE, Security, Architecture, Governance, and Finance teams to ensure:
β’ SLO alignment and performance integrity
β’ Validated disaster recovery capacity
β’ Engineered buffer standards
β’ U.S. and U.S. Territoriesβonly processing and storage compliance
β’ Financial optimization aligned to demand forecasts
What Youβll Own
Cloud Capacity Strategy & Engineering
β’ Lead the end-to-end capacity lifecycle: modeling, forecasting, monitoring, tuning, and governance
β’ Build and maintain service-level capacity models across compute, AKS, databases, storage, networking, and cryptographic services
β’ Establish and enforce engineered buffer standards (N+1/N+2, headroom %, runway weeks) aligned to service criticality
β’ Design autoscaling guardrails aligned to SLI/SLO performance metrics
β’ Translate telemetry and trend analysis into scaling, performance, and cost decisions
β’ Partner with product and engineering teams to onboard and offboard workloads with clear demand modeling and cost accountability
Resiliency & Disaster Recovery
β’ Validate hot/warm/cold failover capacity without compromising steady-state performance
β’ Conduct criticality analysis to prioritize high-impact services
β’ Reserve quota and buffer for disaster and maintenance scenarios
β’ Align resiliency strategy to FedRAMP High expectations and government regulatory standards
Financial Stewardship & Demand Management
β’ Own reservation strategy modeling and commitment planning across Azure services
β’ Balance resilience, performance, and cost using Reservations, Savings Plans, and rightsizing strategies
β’ Forecast demand based on product roadmaps and business growth
β’ Drive financial stewardship through optimized commitment-based purchasing
β’ Deliver executive-ready dashboards including utilization, saturation, headroom, scaling efficiency, DR readiness, and cost-to-performance metrics
Compliance & Governance
β’ Contribute to audit evidence and regulatory artifacts (SSP updates, control narratives, POA&M tracking, continuous monitoring artifacts)
β’ Enforce U.S. and U.S. Territoriesβonly processing, storage, backup, and disaster recovery requirements
β’ Participate in Change Advisory Boards (CAB) as the capacity and resiliency representative
β’ Maintain configuration governance discipline across Azure services
Required Experience
β’ 10β12+ years in infrastructure capacity planning and performance engineering
β’ Expert-level Azure knowledge
β’ Demonstrated capacity modeling and forecasting expertise
β’ Strong experience with AKS scaling, throughput, and performance optimization
β’ Deep understanding of Azure cost structures, reservations, and financial modeling
β’ Hands-on experience operating in FedRAMP High or similarly regulated government environments (required)
β’ Strong understanding of resiliency patterns and disaster recovery options
β’ Familiarity with cloud security controls, firewalls, and compliance constraints
β’ Experience working with Infrastructure as Code and CI/CD gated deployments
β’ Strong problem-solving ability and executive communication skills
Preferred Experience
β’ Financial services background
β’ Government program experience
β’ Hands-on experience with Terraform or Bicep
β’ Knowledge of Ansible
β’ Working knowledge of Azure networking (VNets, routing, firewall flows, private endpoints)
β’ Experience with cloud deployment methodologies
Technical Environment
β’ Azure Monitor, Log Analytics, Metrics, Advisor, Cost Management
β’ Reservations, Savings Plans, Quota Management
β’ AKS, VMSS, App Service autoscaling (HPA/VPA)
β’ Prometheus, Grafana
β’ Azure SQL / Managed Instances
β’ Azure Firewall, NSGs, Private Endpoints, Bastion
β’ Azure Key Vault / Managed HSM
β’ Terraform, Bicep, ARM
β’ CI/CD gated deployments
β’ Azure Policy and governance initiatives






