

Queen Square Recruitment
Developer
⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for a Developer on a contract of unspecified length, offering a competitive pay rate. Work is remote, requiring strong skills in Python, PySpark, Azure Cosmos DB, and experience with financial datasets and bi-temporal data models.
🌎 - Country
United Kingdom
💱 - Currency
£ GBP
-
💰 - Day rate
Unknown
-
🗓️ - Date
May 27, 2026
🕒 - Duration
Unknown
-
🏝️ - Location
Unknown
-
📄 - Contract
Unknown
-
🔒 - Security
Unknown
-
📍 - Location detailed
London Area, United Kingdom
-
🧠 - Skills detailed
#Data Lake #Data Architecture #Vault #Microsoft Power BI #NoSQL #Datasets #Pytest #Artifactory #Data Processing #GDPR (General Data Protection Regulation) #Storage #Spark SQL #Observability #API (Application Programming Interface) #Azure CLI (Azure Command Line Interface) #Azure DevOps #Deployment #SQL (Structured Query Language) #PySpark #KQL (Kusto Query Language) #Python #Azure Cosmos DB #Integration Testing #DevOps #Azure #"ETL (Extract #Transform #Load)" #Delta Lake #Datadog #Indexing #Data Governance #ADLS (Azure Data Lake Storage) #Compliance #GitLab #Scala #YAML (YAML Ain't Markup Language) #Azure ADLS (Azure Data Lake Storage) #Spark (Apache Spark) #Data Pipeline #Monitoring #Batch #BI (Business Intelligence) #CLI (Command-Line Interface) #Cloud #SonarQube #JSON (JavaScript Object Notation) #Data Quality
Role description
The Role
You will be part of a specialist engineering team responsible for designing, building, and optimising end-to-end financial instrument mastering pipelines. These pipelines span ingestion, normalisation, bi-temporal processing, and publication into enterprise data platforms.
You will work closely with data architects, domain experts, and QC engineers to deliver scalable, reliable, and high-performance data solutions across Azure and Microsoft Fabric ecosystems.
Key Responsibilities
• Build and maintain PySpark-based data pipelines for financial instrument mastering across multiple data sources
• Design and implement bi-temporal data processing models (system time + valid time) including Slice, Resolve, Coalesce, and Diff logic
• Develop optimised Azure Cosmos DB data models, including partitioning, indexing, change feed processing, and point-read optimisation
• Integrate external APIs for entity resolution and matching services (PermID / IAAS) with robust retry and batching mechanisms
• Design publication pipelines to convert bi-temporal data into uni-temporal outputs and publish via Microsoft Fabric / Parquet-based lakehouse architectures
• Implement data quality frameworks using Great Expectations to ensure accuracy and compliance
• Build robust unit and integration tests using PyTest for PySpark and Cosmos DB components
• Support and maintain CI/CD pipelines (GitLab CI) including Python packaging, Artifactory deployment, and ARM-based infrastructure provisioning
• Work with YAML-driven configuration for mastering rules, schemas, and environment setup
• Monitor and troubleshoot production pipelines using Eventstream telemetry, KQL, and DataDog observability tools
• Deliver scalable transformation logic, optimised aggregations, and high-performance data processing workflows
• Implement data governance controls including data masking, role-based access, and compliance policies
• Continuously tune and optimise workloads for performance, cost efficiency, and reliability
Required Skills & Experience
• Strong experience in Python and PySpark (Spark SQL, DataFrame API, Structured Streaming)
• Hands-on experience building large-scale ETL / streaming data pipelines
• Experience working with Azure Cosmos DB (NoSQL) including data modelling and performance tuning
• Strong knowledge of Azure Data Lake Storage (ADLS / OneLake / ABFS)
• Experience implementing bi-temporal or SCD Type 2 data models
• Strong understanding of data quality frameworks (e.g., Great Expectations)
• Experience with CI/CD pipelines (GitLab / Azure DevOps) and automated deployments
• Strong testing discipline using PyTest, mocking, and integration testing approaches
• Experience working with YAML/JSON configuration and infrastructure-as-code (ARM templates)
• Strong understanding of distributed data processing and Spark-based architectures
• Experience working with financial or time-series datasets (market data, reference data, risk data preferred)
• Strong communication skills and ability to work with cross-functional stakeholders
Desirable Experience
• Microsoft Fabric (Notebooks, Eventstream, Lakehouses, Spark Job Definitions)
• Financial instrument/reference data (ISIN, CUSIP, LEI, PermID)
• Entity resolution / matching systems and enrichment APIs
• Delta Lake and Change Data Feed (CDF)
• Cosmos DB performance optimisation (RU tuning, bulk operations, concurrency)
• Jinja2 templating or code generation approaches
• SonarQube or similar code quality tooling
• Monorepo development with modern Python packaging tools (uv / Hatchling)
• Power BI / semantic modelling experience
• Knowledge of financial compliance standards (GDPR, SOX)
Technology Stack
Python 3.11+, PySpark 3.5, Spark SQL
Azure Cosmos DB, ADLS, OneLake, Delta Lake, Parquet
Microsoft Fabric (Eventstream, Notebooks, Lakehouse)
Great Expectations, LSEG Data Validation frameworks
GitLab CI/CD, JFrog Artifactory, ARM Templates
DataDog, Eventstream, KQL monitoring
Azure Key Vault, Azure CLI, Fabric APIs
Why Join
• Work on a global financial markets transformation programme
• Hands-on with next-generation Azure + Fabric data platforms
• Exposure to bi-temporal modelling and financial instrument mastering systems
• High-impact engineering role with modern cloud and streaming architecture
• Opportunity to work with leading domain and technical experts in a regulated environment
The Role
You will be part of a specialist engineering team responsible for designing, building, and optimising end-to-end financial instrument mastering pipelines. These pipelines span ingestion, normalisation, bi-temporal processing, and publication into enterprise data platforms.
You will work closely with data architects, domain experts, and QC engineers to deliver scalable, reliable, and high-performance data solutions across Azure and Microsoft Fabric ecosystems.
Key Responsibilities
• Build and maintain PySpark-based data pipelines for financial instrument mastering across multiple data sources
• Design and implement bi-temporal data processing models (system time + valid time) including Slice, Resolve, Coalesce, and Diff logic
• Develop optimised Azure Cosmos DB data models, including partitioning, indexing, change feed processing, and point-read optimisation
• Integrate external APIs for entity resolution and matching services (PermID / IAAS) with robust retry and batching mechanisms
• Design publication pipelines to convert bi-temporal data into uni-temporal outputs and publish via Microsoft Fabric / Parquet-based lakehouse architectures
• Implement data quality frameworks using Great Expectations to ensure accuracy and compliance
• Build robust unit and integration tests using PyTest for PySpark and Cosmos DB components
• Support and maintain CI/CD pipelines (GitLab CI) including Python packaging, Artifactory deployment, and ARM-based infrastructure provisioning
• Work with YAML-driven configuration for mastering rules, schemas, and environment setup
• Monitor and troubleshoot production pipelines using Eventstream telemetry, KQL, and DataDog observability tools
• Deliver scalable transformation logic, optimised aggregations, and high-performance data processing workflows
• Implement data governance controls including data masking, role-based access, and compliance policies
• Continuously tune and optimise workloads for performance, cost efficiency, and reliability
Required Skills & Experience
• Strong experience in Python and PySpark (Spark SQL, DataFrame API, Structured Streaming)
• Hands-on experience building large-scale ETL / streaming data pipelines
• Experience working with Azure Cosmos DB (NoSQL) including data modelling and performance tuning
• Strong knowledge of Azure Data Lake Storage (ADLS / OneLake / ABFS)
• Experience implementing bi-temporal or SCD Type 2 data models
• Strong understanding of data quality frameworks (e.g., Great Expectations)
• Experience with CI/CD pipelines (GitLab / Azure DevOps) and automated deployments
• Strong testing discipline using PyTest, mocking, and integration testing approaches
• Experience working with YAML/JSON configuration and infrastructure-as-code (ARM templates)
• Strong understanding of distributed data processing and Spark-based architectures
• Experience working with financial or time-series datasets (market data, reference data, risk data preferred)
• Strong communication skills and ability to work with cross-functional stakeholders
Desirable Experience
• Microsoft Fabric (Notebooks, Eventstream, Lakehouses, Spark Job Definitions)
• Financial instrument/reference data (ISIN, CUSIP, LEI, PermID)
• Entity resolution / matching systems and enrichment APIs
• Delta Lake and Change Data Feed (CDF)
• Cosmos DB performance optimisation (RU tuning, bulk operations, concurrency)
• Jinja2 templating or code generation approaches
• SonarQube or similar code quality tooling
• Monorepo development with modern Python packaging tools (uv / Hatchling)
• Power BI / semantic modelling experience
• Knowledge of financial compliance standards (GDPR, SOX)
Technology Stack
Python 3.11+, PySpark 3.5, Spark SQL
Azure Cosmos DB, ADLS, OneLake, Delta Lake, Parquet
Microsoft Fabric (Eventstream, Notebooks, Lakehouse)
Great Expectations, LSEG Data Validation frameworks
GitLab CI/CD, JFrog Artifactory, ARM Templates
DataDog, Eventstream, KQL monitoring
Azure Key Vault, Azure CLI, Fabric APIs
Why Join
• Work on a global financial markets transformation programme
• Hands-on with next-generation Azure + Fabric data platforms
• Exposure to bi-temporal modelling and financial instrument mastering systems
• High-impact engineering role with modern cloud and streaming architecture
• Opportunity to work with leading domain and technical experts in a regulated environment






