

Infoplus Technologies UK Limited
Data Engineer
⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for a Data Engineer with a contract length of "unknown", offering a pay rate of "unknown". Key skills include 7+ years of AWS data engineering, S3, Glue, Athena, OpenSearch, and strong metadata modeling experience in scientific domains.
🌎 - Country
United Kingdom
💱 - Currency
£ GBP
-
💰 - Day rate
Unknown
-
🗓️ - Date
May 27, 2026
🕒 - Duration
Unknown
-
🏝️ - Location
Unknown
-
📄 - Contract
Unknown
-
🔒 - Security
Unknown
-
📍 - Location detailed
United Kingdom
-
🧠 - Skills detailed
#AWS (Amazon Web Services) #OpenSearch #Visualization #Athena #Spark (Apache Spark) #S3 (Amazon Simple Storage Service) #AI (Artificial Intelligence) #API (Application Programming Interface) #Metadata #PySpark #UAT (User Acceptance Testing) #Python #Data Orchestration #Cloud #Data Engineering #JSON (JavaScript Object Notation) #ML (Machine Learning)
Role description
• Key responsibilities on this engagement
• • Run the Sprint 1 architecture review of the existing UAT codebase (S3 + Glue + S3 Tables + OpenSearch + Athena) and deliver written gap findings.
• • Design the metadata schema, taxonomy, and field catalogue (Light, Brain, Power).
• • Tune data orchestration — Glue jobs, Athena queries, S3 Tables config, scheduling. Lead the deep-dive technical sessions with analysts on visualization requirements
• • Build and validate the simulation data onboarding pipeline against real data — including the 30 GB-per-run acoustic spectra dataset.
• • Configure and validate the OpenSearch k-NN vector store and the Bedrock embedding pipeline.
• • Author the AI/ML data export format specification and the AI onboarding pattern document.
• • Co-design the API middleware blueprint with the Cloud Infrastructure Architect.
• Must-have
• Principal-level hands-on data engineering on AWS — 7+ years
• Deep production experience with S3, S3 Tables, Glue, Athena, and OpenSearch
• (including k-NN / vector search)
• Built and shipped vector embedding workloads
• Strong metadata modelling and data taxonomy design experience for scientific
• or engineering domains
• Comfort working with Parquet, JSON-LD, and large binary scientific data formats
• (mesh, time-series, spectra)
• Python proficiency; PySpark / Glue job tuning experience
• Key responsibilities on this engagement
• • Run the Sprint 1 architecture review of the existing UAT codebase (S3 + Glue + S3 Tables + OpenSearch + Athena) and deliver written gap findings.
• • Design the metadata schema, taxonomy, and field catalogue (Light, Brain, Power).
• • Tune data orchestration — Glue jobs, Athena queries, S3 Tables config, scheduling. Lead the deep-dive technical sessions with analysts on visualization requirements
• • Build and validate the simulation data onboarding pipeline against real data — including the 30 GB-per-run acoustic spectra dataset.
• • Configure and validate the OpenSearch k-NN vector store and the Bedrock embedding pipeline.
• • Author the AI/ML data export format specification and the AI onboarding pattern document.
• • Co-design the API middleware blueprint with the Cloud Infrastructure Architect.
• Must-have
• Principal-level hands-on data engineering on AWS — 7+ years
• Deep production experience with S3, S3 Tables, Glue, Athena, and OpenSearch
• (including k-NN / vector search)
• Built and shipped vector embedding workloads
• Strong metadata modelling and data taxonomy design experience for scientific
• or engineering domains
• Comfort working with Parquet, JSON-LD, and large binary scientific data formats
• (mesh, time-series, spectra)
• Python proficiency; PySpark / Glue job tuning experience






