

Resilience, Testability & Scalability Lead
β - Featured Role | Apply direct with Data Freelance Hub
This role is for a Resilience, Testability & Scalability Lead, offering a hybrid work location in Fort Mill, SC/New York/New Jersey. The contract length and pay rate are unspecified. Key skills include AWS, disaster recovery, test automation, and familiarity with FINRA/SIP compliance.
π - Country
United States
π± - Currency
$ USD
-
π° - Day rate
440
-
ποΈ - Date discovered
August 2, 2025
π - Project duration
Unknown
-
ποΈ - Location type
Hybrid
-
π - Contract type
Unknown
-
π - Security clearance
Unknown
-
π - Location detailed
New York, United States
-
π§ - Skills detailed
#Prometheus #Observability #Data Management #Load Balancing #Presto #Pytest #Logging #Apache Spark #Automation #DevOps #Kafka (Apache Kafka) #Data Processing #Indexing #Replication #Scala #"ETL (Extract #Transform #Load)" #Unit Testing #Monitoring #Anomaly Detection #Cloud #Disaster Recovery #Security #AWS (Amazon Web Services) #Code Reviews #Deployment #Spark (Apache Spark) #AutoScaling #API (Application Programming Interface) #Redis #Compliance #Docker
Role description
Heading 1
Heading 2
Heading 3
Heading 4
Heading 5
Heading 6
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
Block quote
Ordered list
- Item 1
- Item 2
- Item 3
Unordered list
- Item A
- Item B
- Item C
Bold text
Emphasis
Superscript
Subscript
Job Title: Resilience, Testability & Scalability Lead
Location: Fort Mill, SC / New York / New Jersey /(Hybrid)
Key Responsibilities:
β’ Design and implement high availability and failoverstrategies acrossmulti-zone AWS deployments
β’ Lead the development and execution of disaster recovery and business continuity plans ,including RTO/RPO validation and cross-region strategies
β’ Define testability strategies, test data management frameworks,and performance testing protocols
β’ Enable infrastructure and application resilience by introducing circuit breakers, retry patterns, service meshes ,and graceful degradation mechanisms
β’ Establishreal-timemonitoring, alerting,and logaggregation frameworks using tools like Cloud Watch and Prometheus
β’ Drive test automation and quality engineering best practices, integrating withCI/CD pipelines
β’ Optimize application and data layer performance through querytuning, caching, and indexing strategies
β’ Scale data processing using distributed frameworks like Apache Spark,and implementevent-driven streamprocessing with Kafka
β’ Collaborate with platform, DevOps, and SRE teamstoensure resource efficiency, cost control, and performance SLAs
β’ Contribute toregulatory readiness by enforcing security, encryption, and audit logging standards
Required Skills Experience:
β’ Infrastructure Resiliencies DR:
β’ Multi-AZ deployments, auto-scaling, load balancing, circuit breakers
β’ Disaster recovery design: backup/restore, cross-region replication, RTO/RPO
β’ Monitoring Observability:
β’ Experience with Cloud Watch, Prometheus us, log aggregators
β’ Setup alerting for incident response, latency, through put, and error rates
β’ Application Resilience Security:
β’ Error handling, service degradation, exponential back off
β’ Security best practices: I AM policies, encryption Trest/transit
β’ Familiarity with FINRA/SIP Compliance standards(preferred)
Test Automations Quality:
β’ Unit testing(e.g.,pytest),integrationtesting,E2E automation
β’ Test data generation, synthetic data, environment provisioning
β’ Performance testing using J Meter, Gatling, stress and capacity testing
β’ Code reviews, static analysis, data validation, anomaly detection
β’ Scalability Optimization:
β’ Horizontal scaling using Kuber netes, Docker, service discovery
β’ API Gateway, caching layers (Redis, Memcached),DB partitioning
β’ Connection pooling, capacity planning, cost-aware architecture
Datas Stream Processing:
β’ Spark cluster management, parallel processing, bigdata optimization
β’ Kafka-based messaging, windowing, and aggregation for real- time data