Queen Square Recruitment

Developer

⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for a Developer on a contract of unspecified length, offering a competitive pay rate. Work is remote, requiring strong skills in Python, PySpark, Azure Cosmos DB, and experience with financial datasets and bi-temporal data models.
🌎 - Country
United Kingdom
💱 - Currency
£ GBP
-
💰 - Day rate
Unknown
-
🗓️ - Date
May 27, 2026
🕒 - Duration
Unknown
-
🏝️ - Location
Unknown
-
📄 - Contract
Unknown
-
🔒 - Security
Unknown
-
📍 - Location detailed
London Area, United Kingdom
-
🧠 - Skills detailed
#Data Lake #Data Architecture #Vault #Microsoft Power BI #NoSQL #Datasets #Pytest #Artifactory #Data Processing #GDPR (General Data Protection Regulation) #Storage #Spark SQL #Observability #API (Application Programming Interface) #Azure CLI (Azure Command Line Interface) #Azure DevOps #Deployment #SQL (Structured Query Language) #PySpark #KQL (Kusto Query Language) #Python #Azure Cosmos DB #Integration Testing #DevOps #Azure #"ETL (Extract #Transform #Load)" #Delta Lake #Datadog #Indexing #Data Governance #ADLS (Azure Data Lake Storage) #Compliance #GitLab #Scala #YAML (YAML Ain't Markup Language) #Azure ADLS (Azure Data Lake Storage) #Spark (Apache Spark) #Data Pipeline #Monitoring #Batch #BI (Business Intelligence) #CLI (Command-Line Interface) #Cloud #SonarQube #JSON (JavaScript Object Notation) #Data Quality
Role description
The Role You will be part of a specialist engineering team responsible for designing, building, and optimising end-to-end financial instrument mastering pipelines. These pipelines span ingestion, normalisation, bi-temporal processing, and publication into enterprise data platforms. You will work closely with data architects, domain experts, and QC engineers to deliver scalable, reliable, and high-performance data solutions across Azure and Microsoft Fabric ecosystems. Key Responsibilities • Build and maintain PySpark-based data pipelines for financial instrument mastering across multiple data sources • Design and implement bi-temporal data processing models (system time + valid time) including Slice, Resolve, Coalesce, and Diff logic • Develop optimised Azure Cosmos DB data models, including partitioning, indexing, change feed processing, and point-read optimisation • Integrate external APIs for entity resolution and matching services (PermID / IAAS) with robust retry and batching mechanisms • Design publication pipelines to convert bi-temporal data into uni-temporal outputs and publish via Microsoft Fabric / Parquet-based lakehouse architectures • Implement data quality frameworks using Great Expectations to ensure accuracy and compliance • Build robust unit and integration tests using PyTest for PySpark and Cosmos DB components • Support and maintain CI/CD pipelines (GitLab CI) including Python packaging, Artifactory deployment, and ARM-based infrastructure provisioning • Work with YAML-driven configuration for mastering rules, schemas, and environment setup • Monitor and troubleshoot production pipelines using Eventstream telemetry, KQL, and DataDog observability tools • Deliver scalable transformation logic, optimised aggregations, and high-performance data processing workflows • Implement data governance controls including data masking, role-based access, and compliance policies • Continuously tune and optimise workloads for performance, cost efficiency, and reliability Required Skills & Experience • Strong experience in Python and PySpark (Spark SQL, DataFrame API, Structured Streaming) • Hands-on experience building large-scale ETL / streaming data pipelines • Experience working with Azure Cosmos DB (NoSQL) including data modelling and performance tuning • Strong knowledge of Azure Data Lake Storage (ADLS / OneLake / ABFS) • Experience implementing bi-temporal or SCD Type 2 data models • Strong understanding of data quality frameworks (e.g., Great Expectations) • Experience with CI/CD pipelines (GitLab / Azure DevOps) and automated deployments • Strong testing discipline using PyTest, mocking, and integration testing approaches • Experience working with YAML/JSON configuration and infrastructure-as-code (ARM templates) • Strong understanding of distributed data processing and Spark-based architectures • Experience working with financial or time-series datasets (market data, reference data, risk data preferred) • Strong communication skills and ability to work with cross-functional stakeholders Desirable Experience • Microsoft Fabric (Notebooks, Eventstream, Lakehouses, Spark Job Definitions) • Financial instrument/reference data (ISIN, CUSIP, LEI, PermID) • Entity resolution / matching systems and enrichment APIs • Delta Lake and Change Data Feed (CDF) • Cosmos DB performance optimisation (RU tuning, bulk operations, concurrency) • Jinja2 templating or code generation approaches • SonarQube or similar code quality tooling • Monorepo development with modern Python packaging tools (uv / Hatchling) • Power BI / semantic modelling experience • Knowledge of financial compliance standards (GDPR, SOX) Technology Stack Python 3.11+, PySpark 3.5, Spark SQL Azure Cosmos DB, ADLS, OneLake, Delta Lake, Parquet Microsoft Fabric (Eventstream, Notebooks, Lakehouse) Great Expectations, LSEG Data Validation frameworks GitLab CI/CD, JFrog Artifactory, ARM Templates DataDog, Eventstream, KQL monitoring Azure Key Vault, Azure CLI, Fabric APIs Why Join • Work on a global financial markets transformation programme • Hands-on with next-generation Azure + Fabric data platforms • Exposure to bi-temporal modelling and financial instrument mastering systems • High-impact engineering role with modern cloud and streaming architecture • Opportunity to work with leading domain and technical experts in a regulated environment