

Gazelle Global
GenAI Data Engineer
⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for a GenAI Data Engineer with a contract length of "unknown," offering a pay rate of "unknown." Key skills include PySpark, Python, AWS, and GenAI model expertise, with a focus on scalable data pipelines and ETL processes.
🌎 - Country
United Kingdom
💱 - Currency
£ GBP
-
💰 - Day rate
Unknown
-
🗓️ - Date
April 28, 2026
🕒 - Duration
Unknown
-
🏝️ - Location
Unknown
-
📄 - Contract
Unknown
-
🔒 - Security
Unknown
-
📍 - Location detailed
London Area, United Kingdom
-
🧠 - Skills detailed
#Data Processing #Databases #ML (Machine Learning) #Indexing #AWS (Amazon Web Services) #Data Engineering #Data Pipeline #Automation #Datasets #S3 (Amazon Simple Storage Service) #Redshift #Schema Design #Data Governance #Lambda (AWS Lambda) #Delta Lake #Snowflake #Python #PySpark #SQL (Structured Query Language) #"ETL (Extract #Transform #Load)" #Data Storage #AI (Artificial Intelligence) #Model Optimization #Spark (Apache Spark) #Distributed Computing #Storage #DynamoDB #Scala
Role description
Your responsibilities:
• Design and maintain scalable data pipelines using PySpark, Python, and distributed computing frameworks to support high‑volume data processing.
• Architect and optimize AWS-based data and AI infrastructure, ensuring secure, performant, and cost‑efficient ingestion, transformation, and storage.
• Develop, finetune, benchmark, and evaluate GenAI/LLM models, including custom training and inference optimization.
• Implement and maintain RAG pipelines, vector databases, and document-processing workflows for enterprise GenAI applications.
• Build reusable frameworks for prompt management, evaluation, and GenAI operations.
• Collaborate with cross-functional teams to integrate GenAI capabilities into production systems and ensure high-quality data, governance, and operational reliability
Your Profile
Essential skills/knowledge/experience:
• Strong experience with PySpark, distributed data processing, and largescale ETL/ELT pipelines.
• Strong SQL expertise including star/snowflake schema design, indexing strategies, writing optimized queries, and implementing CDC, SCD Type 1/2/3 patterns for reliable data warehousing.
• Advanced proficiency in Python for data engineering, automation, and ML/GenAI integration.
• Hands‑on expertise with AWS services (S3, Glue, Lambda, EMR, Bedrock / custom model hosting).
• Practical experience with GenAI/LLM model creation, finetuning, benchmarking, and evaluation.
• Solid understanding of RAG architectures, embeddings, vector stores, and LLM evaluation methods.
• Experience working with structured and unstructured datasets (documents, logs, text, images).
• Familiarity with scalable data storage solutions (Delta Lake, Parquet, Redshift, DynamoDB).
• Understanding model optimization techniques (quantization, distillation, inference optimization).
• Strong capability to debug, tune, and optimize distributed systems and AI pipelines.
Desirable skills/knowledge/experience: (As applicable)
• Pyspark, Python, SQL,AWS, GenAI
Your responsibilities:
• Design and maintain scalable data pipelines using PySpark, Python, and distributed computing frameworks to support high‑volume data processing.
• Architect and optimize AWS-based data and AI infrastructure, ensuring secure, performant, and cost‑efficient ingestion, transformation, and storage.
• Develop, finetune, benchmark, and evaluate GenAI/LLM models, including custom training and inference optimization.
• Implement and maintain RAG pipelines, vector databases, and document-processing workflows for enterprise GenAI applications.
• Build reusable frameworks for prompt management, evaluation, and GenAI operations.
• Collaborate with cross-functional teams to integrate GenAI capabilities into production systems and ensure high-quality data, governance, and operational reliability
Your Profile
Essential skills/knowledge/experience:
• Strong experience with PySpark, distributed data processing, and largescale ETL/ELT pipelines.
• Strong SQL expertise including star/snowflake schema design, indexing strategies, writing optimized queries, and implementing CDC, SCD Type 1/2/3 patterns for reliable data warehousing.
• Advanced proficiency in Python for data engineering, automation, and ML/GenAI integration.
• Hands‑on expertise with AWS services (S3, Glue, Lambda, EMR, Bedrock / custom model hosting).
• Practical experience with GenAI/LLM model creation, finetuning, benchmarking, and evaluation.
• Solid understanding of RAG architectures, embeddings, vector stores, and LLM evaluation methods.
• Experience working with structured and unstructured datasets (documents, logs, text, images).
• Familiarity with scalable data storage solutions (Delta Lake, Parquet, Redshift, DynamoDB).
• Understanding model optimization techniques (quantization, distillation, inference optimization).
• Strong capability to debug, tune, and optimize distributed systems and AI pipelines.
Desirable skills/knowledge/experience: (As applicable)
• Pyspark, Python, SQL,AWS, GenAI






