

Annapurna
Data Engineer
⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for a Data Engineer on a 6-month freelance contract, hybrid in London, starting ASAP. Key skills include strong production experience with Apache Spark and Python, and expertise in large-scale data ingestion and ETL pipelines.
🌎 - Country
United Kingdom
💱 - Currency
£ GBP
-
💰 - Day rate
Unknown
-
🗓️ - Date
July 2, 2026
🕒 - Duration
More than 6 months
-
🏝️ - Location
Hybrid
-
📄 - Contract
Inside IR35
-
🔒 - Security
Unknown
-
📍 - Location detailed
London Area, United Kingdom
-
🧠 - Skills detailed
#Airflow #Java #Databricks #Data Processing #Storage #Python #Scala #AI (Artificial Intelligence) #Batch #ML (Machine Learning) #Spark (Apache Spark) #Data Ingestion #Delta Lake #"ETL (Extract #Transform #Load)" #Apache Spark #Debugging #Datasets #Data Science #Data Engineering
Role description
Data Ingestion Engineer (Inside IR25) / Hybrid London / 6 months Freelance Contract / Start ASAP
Key Responsibilities:
You will work with the Data Ingestion team to improve the reliability, efficiency and throughput of the pipelines that move real-world driving data through the client..
● Debug and fix failing or blocked ingestion pipelines.
● Investigate issues caused by corrupt, malformed or unexpected data.
● Help make our pipelines more resilient so individual bad data segments do not block
wider workflows.
● Improve how we handle varied data formats from partners, suppliers and third-party
sources.
● Support orchestration across multi-step ingestion workflows, including dependencies,
retries and queue management.
● Optimise Spark jobs and data processing pipelines for throughput and compute
efficiency.
● Reduce operational toil around failed jobs, stalled pipelines and manual interventions.
● Work on high-volume batch processing systems where throughput, reliability and cost all
matter.
● Help prioritise and unblock important datasets for downstream annotation, data science
and model training teams.
● Partner with permanent engineers to keep critical ingestion work moving while
longer-term platform improvements are developed.
● Contribute pragmatic improvements that make the system easier to operate, scale and trust.
Essential
● Strong production experience with Apache Spark.
● Strong Python engineering experience.
● Experience building, debugging or operating large-scale data ingestion, ETL or data
processing pipelines.
● Experience with distributed data processing systems.
● Ability to optimise jobs for throughput, compute efficiency and reliability.
● Experience debugging production pipeline failures.
● Comfort working with messy, corrupt, incomplete or inconsistent data.
● Understanding of orchestration across multi-step pipelines and downstream
dependencies.
● Ability to work independently in a fast-moving, highly technical environment.
● A practical, delivery-focused mindset and the ability to ramp quickly.
● Experience working at significant data scale, ideally PB-scale or similarly
high-throughput environments.
Desirable:
Experience in one or more of the following areas would be a strong advantage:
● Airflow, Flyte, Databricks workflows or similar orchestration tooling.
● Databricks, Delta Lake or Delta tables.
● Scala or Java, especially in Spark-based environments.
● Queue-based processing, retry handling and priority data workflows.
● High-throughput batch data processing systems.
● Production systems with many data producers, consumers or external data sources.
● Handling third-party, partner or supplier data with inconsistent formats and quality issues.
● Automotive, robotics, autonomy, mapping, ML data platforms or embodied AI
environments.
● Cost optimisation for compute- and storage-heavy data platforms.
● High-performance engineering experience from domains such as trading, where it
includes relevant distributed systems or throughput-focused work.
Data Ingestion Engineer (Inside IR25) / Hybrid London / 6 months Freelance Contract / Start ASAP
Key Responsibilities:
You will work with the Data Ingestion team to improve the reliability, efficiency and throughput of the pipelines that move real-world driving data through the client..
● Debug and fix failing or blocked ingestion pipelines.
● Investigate issues caused by corrupt, malformed or unexpected data.
● Help make our pipelines more resilient so individual bad data segments do not block
wider workflows.
● Improve how we handle varied data formats from partners, suppliers and third-party
sources.
● Support orchestration across multi-step ingestion workflows, including dependencies,
retries and queue management.
● Optimise Spark jobs and data processing pipelines for throughput and compute
efficiency.
● Reduce operational toil around failed jobs, stalled pipelines and manual interventions.
● Work on high-volume batch processing systems where throughput, reliability and cost all
matter.
● Help prioritise and unblock important datasets for downstream annotation, data science
and model training teams.
● Partner with permanent engineers to keep critical ingestion work moving while
longer-term platform improvements are developed.
● Contribute pragmatic improvements that make the system easier to operate, scale and trust.
Essential
● Strong production experience with Apache Spark.
● Strong Python engineering experience.
● Experience building, debugging or operating large-scale data ingestion, ETL or data
processing pipelines.
● Experience with distributed data processing systems.
● Ability to optimise jobs for throughput, compute efficiency and reliability.
● Experience debugging production pipeline failures.
● Comfort working with messy, corrupt, incomplete or inconsistent data.
● Understanding of orchestration across multi-step pipelines and downstream
dependencies.
● Ability to work independently in a fast-moving, highly technical environment.
● A practical, delivery-focused mindset and the ability to ramp quickly.
● Experience working at significant data scale, ideally PB-scale or similarly
high-throughput environments.
Desirable:
Experience in one or more of the following areas would be a strong advantage:
● Airflow, Flyte, Databricks workflows or similar orchestration tooling.
● Databricks, Delta Lake or Delta tables.
● Scala or Java, especially in Spark-based environments.
● Queue-based processing, retry handling and priority data workflows.
● High-throughput batch data processing systems.
● Production systems with many data producers, consumers or external data sources.
● Handling third-party, partner or supplier data with inconsistent formats and quality issues.
● Automotive, robotics, autonomy, mapping, ML data platforms or embodied AI
environments.
● Cost optimisation for compute- and storage-heavy data platforms.
● High-performance engineering experience from domains such as trading, where it
includes relevant distributed systems or throughput-focused work.






