

Vallum Associates
Gen AI Data Engineer - Pyspark/Python
β - Featured Role | Apply direct with Data Freelance Hub
This role is for a Gen AI Data Engineer with strong PySpark, Python, and SQL skills, requiring experience in large-scale ETL/ELT pipelines and AWS services. The contract is hybrid for 6 months, based in London or Edinburgh, UK.
π - Country
United Kingdom
π± - Currency
Β£ GBP
-
π° - Day rate
Unknown
-
ποΈ - Date
April 29, 2026
π - Duration
Unknown
-
ποΈ - Location
Hybrid
-
π - Contract
Inside IR35
-
π - Security
Unknown
-
π - Location detailed
London Area, United Kingdom
-
π§ - Skills detailed
#Lambda (AWS Lambda) #AI (Artificial Intelligence) #S3 (Amazon Simple Storage Service) #Indexing #DynamoDB #SQL (Structured Query Language) #"ETL (Extract #Transform #Load)" #Model Optimization #Delta Lake #PySpark #Datasets #Scala #Data Engineering #Data Storage #ML (Machine Learning) #Data Processing #Storage #Schema Design #Spark (Apache Spark) #Snowflake #Python #Redshift #AWS (Amazon Web Services) #Automation
Role description
The Role: GenAI Data Engineer
Location: London (or) Edinburgh, UK
Position Type: Contract Inside IR35
Remote work option Available: Hybrid β 2 Days Onsite
Job Description:
Essential skills/knowledge/experience:
β’ Strong experience with PySpark, distributed data processing, and largescale ETL/ELT pipelines.
β’ Strong SQL expertise including star/snowflake schema design, indexing strategies, writing optimized queries, and implementing CDC, SCD Type 1/2/3 patterns for reliable data warehousing.
β’ Advanced proficiency in Python for data engineering, automation, and ML/GenAI integration.
β’ Handsβon expertise with AWS services (S3, Glue, Lambda, EMR, Bedrock / custom model hosting).
β’ Practical experience with GenAI/LLM model creation, finetuning, benchmarking, and evaluation.
β’ Solid understanding of RAG architectures, embeddings, vector stores, and LLM evaluation methods.
β’ Experience working with structured and unstructured datasets (documents, logs, text, images).
β’ Familiarity with scalable data storage solutions (Delta Lake, Parquet, Redshift, DynamoDB).
β’ Understanding model optimization techniques (quantization, distillation, inference optimization).
β’ Strong capability to debug, tune, and optimize distributed systems and AI pipelines.
Desirable skills/knowledge/experience:
β’ Pyspark, Python, SQL,AWS, GenAI
The Role: GenAI Data Engineer
Location: London (or) Edinburgh, UK
Position Type: Contract Inside IR35
Remote work option Available: Hybrid β 2 Days Onsite
Job Description:
Essential skills/knowledge/experience:
β’ Strong experience with PySpark, distributed data processing, and largescale ETL/ELT pipelines.
β’ Strong SQL expertise including star/snowflake schema design, indexing strategies, writing optimized queries, and implementing CDC, SCD Type 1/2/3 patterns for reliable data warehousing.
β’ Advanced proficiency in Python for data engineering, automation, and ML/GenAI integration.
β’ Handsβon expertise with AWS services (S3, Glue, Lambda, EMR, Bedrock / custom model hosting).
β’ Practical experience with GenAI/LLM model creation, finetuning, benchmarking, and evaluation.
β’ Solid understanding of RAG architectures, embeddings, vector stores, and LLM evaluation methods.
β’ Experience working with structured and unstructured datasets (documents, logs, text, images).
β’ Familiarity with scalable data storage solutions (Delta Lake, Parquet, Redshift, DynamoDB).
β’ Understanding model optimization techniques (quantization, distillation, inference optimization).
β’ Strong capability to debug, tune, and optimize distributed systems and AI pipelines.
Desirable skills/knowledge/experience:
β’ Pyspark, Python, SQL,AWS, GenAI






