Annapurna

Data Engineer

⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for a Data Engineer on a 6-month freelance contract, hybrid in London, starting ASAP. Key skills include strong production experience with Apache Spark and Python, and expertise in large-scale data ingestion and ETL pipelines.
🌎 - Country
United Kingdom
💱 - Currency
£ GBP
-
💰 - Day rate
Unknown
-
🗓️ - Date
July 2, 2026
🕒 - Duration
More than 6 months
-
🏝️ - Location
Hybrid
-
📄 - Contract
Inside IR35
-
🔒 - Security
Unknown
-
📍 - Location detailed
London Area, United Kingdom
-
🧠 - Skills detailed
#Airflow #Java #Databricks #Data Processing #Storage #Python #Scala #AI (Artificial Intelligence) #Batch #ML (Machine Learning) #Spark (Apache Spark) #Data Ingestion #Delta Lake #"ETL (Extract #Transform #Load)" #Apache Spark #Debugging #Datasets #Data Science #Data Engineering
Role description
Data Ingestion Engineer (Inside IR25) / Hybrid London / 6 months Freelance Contract / Start ASAP Key Responsibilities: You will work with the Data Ingestion team to improve the reliability, efficiency and throughput of the pipelines that move real-world driving data through the client.. ● Debug and fix failing or blocked ingestion pipelines. ● Investigate issues caused by corrupt, malformed or unexpected data. ● Help make our pipelines more resilient so individual bad data segments do not block wider workflows. ● Improve how we handle varied data formats from partners, suppliers and third-party sources. ● Support orchestration across multi-step ingestion workflows, including dependencies, retries and queue management. ● Optimise Spark jobs and data processing pipelines for throughput and compute efficiency. ● Reduce operational toil around failed jobs, stalled pipelines and manual interventions. ● Work on high-volume batch processing systems where throughput, reliability and cost all matter. ● Help prioritise and unblock important datasets for downstream annotation, data science and model training teams. ● Partner with permanent engineers to keep critical ingestion work moving while longer-term platform improvements are developed. ● Contribute pragmatic improvements that make the system easier to operate, scale and trust. Essential ● Strong production experience with Apache Spark. ● Strong Python engineering experience. ● Experience building, debugging or operating large-scale data ingestion, ETL or data processing pipelines. ● Experience with distributed data processing systems. ● Ability to optimise jobs for throughput, compute efficiency and reliability. ● Experience debugging production pipeline failures. ● Comfort working with messy, corrupt, incomplete or inconsistent data. ● Understanding of orchestration across multi-step pipelines and downstream dependencies. ● Ability to work independently in a fast-moving, highly technical environment. ● A practical, delivery-focused mindset and the ability to ramp quickly. ● Experience working at significant data scale, ideally PB-scale or similarly high-throughput environments. Desirable: Experience in one or more of the following areas would be a strong advantage: ● Airflow, Flyte, Databricks workflows or similar orchestration tooling. ● Databricks, Delta Lake or Delta tables. ● Scala or Java, especially in Spark-based environments. ● Queue-based processing, retry handling and priority data workflows. ● High-throughput batch data processing systems. ● Production systems with many data producers, consumers or external data sources. ● Handling third-party, partner or supplier data with inconsistent formats and quality issues. ● Automotive, robotics, autonomy, mapping, ML data platforms or embodied AI environments. ● Cost optimisation for compute- and storage-heavy data platforms. ● High-performance engineering experience from domains such as trading, where it includes relevant distributed systems or throughput-focused work.