Lorien

Lead PySpark Developer

⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for a Lead PySpark Developer on a contract until 30 June 2026, offering remote work. Key skills include 5+ years in PySpark and AWS data services, with a financial services background preferred.
🌎 - Country
United Kingdom
πŸ’± - Currency
Β£ GBP
-
πŸ’° - Day rate
Unknown
-
πŸ—“οΈ - Date
February 14, 2026
πŸ•’ - Duration
More than 6 months
-
🏝️ - Location
Remote
-
πŸ“„ - Contract
Unknown
-
πŸ”’ - Security
Unknown
-
πŸ“ - Location detailed
United Kingdom
-
🧠 - Skills detailed
#Base #Version Control #Distributed Computing #Data Pipeline #"ETL (Extract #Transform #Load)" #Athena #Logging #Macros #Scala #GIT #SAS #Data Mart #Migration #Data Architecture #AWS (Amazon Web Services) #Data Processing #Big Data #Spark (Apache Spark) #DevOps #PySpark #IAM (Identity and Access Management) #Python #Scrum #Cloud #Agile #S3 (Amazon Simple Storage Service) #Data Quality
Role description
Lead PySpark Developer Location: Remote (Office days in FCS for Client’s team – no client site travel required) Contract: Until 30 June 2026 We are seeking an experienced Lead PySpark Developer to design, develop, and optimise complex data processing solutions on AWS. This is a hands-on engineering role focused on modernising legacy SAS workflows and delivering scalable, production-ready PySpark data pipelines within a financial services environment. Role Overview You will lead the development and optimisation of PySpark-based ETL/ELT solutions, supporting large-scale SAS-to-PySpark migration initiatives. The role requires deep expertise in distributed data processing, Spark optimisation, and AWS data services, combined with strong engineering discipline and clean coding practices. Key Responsibilities Technical Delivery β€’ Design, develop, and optimise complex PySpark ETL/ELT and data mart solutions β€’ Convert legacy SAS code to PySpark using SAS2PY and manual refactoring β€’ Refactor and stabilise legacy data workflows into modern AWS-based architectures β€’ Ensure high standards of data quality, reliability, and performance AWS & Cloud Engineering β€’ Build and deploy scalable data pipelines using AWS services (EMR, Glue, S3, Athena) β€’ Optimise Spark workloads for performance and cost efficiency β€’ Work within Git-based version control and CI/CD pipelines Data & Engineering Excellence β€’ Apply clean coding standards, modular design, logging, parameterisation, and exception handling β€’ Troubleshoot and optimise distributed Spark workloads (partitioning, tuning, execution plans) β€’ Develop and execute ETL test cases and validation processes Required Skills Core (Role Profile Alignment) β€’ P3 – PySpark (minimum 5+ years hands-on experience) β€’ P3 – AWS data services (EMR, Glue, S3, Athena, IAM) β€’ P1 – SAS (Base SAS, Macros, SAS DI Studio – legacy code understanding and migration) Technical Expertise β€’ Strong Spark performance tuning and optimisation experience β€’ Deep understanding of ETL/ELT, data warehousing, SCDs, dimensions and facts β€’ Strong Python coding and refactoring capability β€’ Experience with distributed computing and big data architectures β€’ Proficiency with Git workflows and CI/CD practices Desirable Experience β€’ Banking or financial services background β€’ Experience on SAS modernisation or cloud migration programmes β€’ Exposure to DevOps tooling and Agile/Scrum delivery This is an excellent opportunity to play a key role in a large-scale data transformation programme, delivering high-performance cloud-based solutions within a collaborative and forward-thinking engineering team.