

Randstad Digital
Pyspark Lead Engineer
⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for a "PySpark Lead Engineer" on a contract basis, focusing on migrating SAS analytics to PySpark on AWS. Key skills include PySpark, Python, AWS services, and SAS proficiency. Experience in financial services and data modeling is required.
🌎 - Country
United Kingdom
💱 - Currency
£ GBP
-
💰 - Day rate
Unknown
-
🗓️ - Date
March 11, 2026
🕒 - Duration
Unknown
-
🏝️ - Location
Unknown
-
📄 - Contract
Unknown
-
🔒 - Security
Unknown
-
📍 - Location detailed
London Area, United Kingdom
-
🧠 - Skills detailed
#Leadership #DevOps #Datasets #S3 (Amazon Simple Storage Service) #Cloud #IAM (Identity and Access Management) #AWS (Amazon Web Services) #Data Reconciliation #PySpark #Debugging #Scala #Data Modeling #Python #Vault #Data Mart #SAS #"ETL (Extract #Transform #Load)" #Lambda (AWS Lambda) #GitLab #Macros #Athena #Terraform #Migration #Data Vault #Spark (Apache Spark) #AWS EMR (Amazon Elastic MapReduce) #Base #Jenkins #Unit Testing #GIT
Role description
PySpark Engineer Lead
Contract
As the Technical Lead, you will drive the high-stakes migration of legacy SAS analytics to a modern, cloud-native PySpark ecosystem on AWS. This isn't just a lift and shift you will refactor complex procedural logic into scalable, production-ready distributed pipelines for a Tier-1 financial services environment.
Core Responsibilities
• Engineering Leadership: Design and develop complex ETL/ELT pipelines and Data Marts using PySpark, EMR, and Glue.
• Legacy Modernisation: Architect the conversion of SAS Base/Macros into modular, testable Python code using SAS2PY and manual refactoring.
• Performance Tuning: Optimise Spark execution (partitioning, shuffling, caching) to ensure cost-efficient processing of massive financial datasets.
• Quality & Governance: Implement rigorous CI/CD, unit testing, and data reconciliation frameworks to ensure "penny-perfect" accuracy.
Technical Stack
• Engine: PySpark (Expert), Python (Clean Code/SOLID principles).
• AWS: EMR, Glue, S3, Athena, IAM, Lambda.
• Data Modeling: SCD Type 2, Fact/Dimension tables, Data Vault/Star Schema.
• Legacy: Proficiency in reading/debugging SAS (Base, Macros, DI Studio).
• DevOps: Git-based workflows, Jenkins/GitLab CI, Terraform.
PySpark Engineer Lead
Contract
As the Technical Lead, you will drive the high-stakes migration of legacy SAS analytics to a modern, cloud-native PySpark ecosystem on AWS. This isn't just a lift and shift you will refactor complex procedural logic into scalable, production-ready distributed pipelines for a Tier-1 financial services environment.
Core Responsibilities
• Engineering Leadership: Design and develop complex ETL/ELT pipelines and Data Marts using PySpark, EMR, and Glue.
• Legacy Modernisation: Architect the conversion of SAS Base/Macros into modular, testable Python code using SAS2PY and manual refactoring.
• Performance Tuning: Optimise Spark execution (partitioning, shuffling, caching) to ensure cost-efficient processing of massive financial datasets.
• Quality & Governance: Implement rigorous CI/CD, unit testing, and data reconciliation frameworks to ensure "penny-perfect" accuracy.
Technical Stack
• Engine: PySpark (Expert), Python (Clean Code/SOLID principles).
• AWS: EMR, Glue, S3, Athena, IAM, Lambda.
• Data Modeling: SCD Type 2, Fact/Dimension tables, Data Vault/Star Schema.
• Legacy: Proficiency in reading/debugging SAS (Base, Macros, DI Studio).
• DevOps: Git-based workflows, Jenkins/GitLab CI, Terraform.






