Intelance

Data Engineer (OCR & Data Pipelines, Contract)

⭐ - Featured Role | Apply direct with Data Freelance Hub

This role is for a Data Engineer (OCR & Data Pipelines) on a contract basis (2-3 days/week) with a competitive pay rate. Key requirements include 3-5+ years of Data Engineering experience, strong Python skills, and practical OCR experience, preferably in healthcare or regulated environments. Remote work is available.

🌎 - Country

United Kingdom

💱 - Currency

£ GBP

💰 - Day rate

750

🗓️ - Date

November 22, 2025

🕒 - Duration

Unknown

🏝️ - Location

Remote

📄 - Contract

Unknown

🔒 - Security

Unknown

📍 - Location detailed

United Kingdom

🧠 - Skills detailed

#Data Processing #Databricks #Monitoring #Metadata #Azure #AI (Artificial Intelligence) #Lean #ADF (Azure Data Factory) #REST (Representational State Transfer) #Data Engineering #Security #Data Pipeline #AWS (Amazon Web Services) #API (Application Programming Interface) #JSON (JavaScript Object Notation) #Data Quality #Azure Data Factory #ML (Machine Learning) #Logging #Python #Storage #Cloud #Batch #GCP (Google Cloud Platform) #"ETL (Extract #Transform #Load)"

Role description

Intelance is a specialist architecture and AI consultancy working with clients in regulated, high-trust environments (healthcare, pharma, life sciences, financial services). We are assembling a lean senior team to deliver an AI-assisted clinical report marking tool for a UK-based, UKAS-accredited organisation in human genetic testing. We are looking for a Data Engineer (OCR & Pipelines) who can turn messy PDFs and documents into clean, reliable, auditable data flows for ML and downstream systems. This is a contract / freelance role (2-3 days/week) working closely with our AI Solution Architect, Lead ML Engineer, and Integration Engineer. Tasks • Design and implement the end-to-end data pipeline for the project: ○ Ingest PDF/Word reports from secure storage ○ Run OCR / text extraction and layout parsing ○ Normalise, structure, and validate the data ○ Store outputs in a form ready for ML and integration. • Evaluate and configure OCR / document AI services (e.g. Azure Form Recognizer or similar), and wrap them in robust, retry-safe, cost-aware scripts/services. • Define and implement data contracts and schemas between ingestion, ML, and integration components (JSON/Parquet/relational as appropriate). • Build quality checks and validation rules (field presence, format, range checks, duplicate detection, basic anomaly checks). • Implement logging, monitoring, and lineage so every processed document can be traced from source > OCR > structured output > model input. • Work with the ML Engineer to ensure the pipeline exposes exactly the features and metadata needed for training, evaluation, and explainability. • Collaborate with the Integration Engineer to deliver clean batch or streaming feeds into the client’s assessment system (API, CSV exports, or SFTP drop-zone). • Follow good security and privacy practices in all pipelines: encryption, access control, least privilege, and redaction where needed. • Contribute to infrastructure decisions (storage layout, job orchestration, simple CI/CD for data jobs). • Document the pipeline clearly: architecture diagrams, table/field definitions, data dictionaries, operational runbooks. Requirements Must-have • 3-5+ years of hands-on Data Engineering experience. • Strong Python skills, including building and packaging data processing scripts or services. • Practical experience with OCR / document processing (e.g. Tesseract, Azure Form Recognizer, AWS Textract, Google Document AI, or equivalent). • Solid experience building ETL / ELT pipelines on a major cloud platform (ideally Azure, but AWS/GCP is fine if you’re comfortable switching). • Good knowledge of data modelling and file formats (JSON, CSV, Parquet, relational schemas). • Experience implementing data quality checks, logging, and monitoring for pipelines. • Understanding of security and privacy basics: encryption at rest/in transit, access control, secure handling of potentially sensitive data. • Comfortable working in a small, senior, remote team; able to take a loosely defined problem and design a clean, maintainable solution. • Available for 2-3 days per week on a contract basis, working largely remotely in UK or close European time zones. Nice-to-have • Experience in healthcare, life sciences, diagnostics, or other regulated environments. • Familiarity with Azure Data Factory, Azure Functions, Databricks, or similar orchestration/compute tools. • Knowledge of basic MLOps concepts (feature stores, model input/output formats). • Experience with SFTP-based exchanges and batch integrations with legacy systems. Benefits • Core impact role: you own the pipeline that makes the entire AI solution possible – without you, nothing moves. • Meaningful domain: your work supports external quality assessment in human genetic testing for labs worldwide. • Lean, senior team: work alongside experienced architects and ML engineers; minimal bureaucracy, direct access to decision-makers. • Remote-first, flexible: work from anywhere compatible with UK hours, 2-3 days/week. • Contract / freelance: competitive day rate, with potential extension into further phases and additional schemes if the pilot is successful. • Opportunity to build reusable data pipeline components that Intelance will deploy across future AI engagements. We review every application personally. If there’s a good match, we’ll invite you to a short call to walk through the project, expectations, and next steps.

Apply now Apply with DFH Sign up

Intelance

Data Engineer (OCR & Data Pipelines, Contract)

D365 Functional Consultant

Affiliate SQL DBA

AI Solution Architect (Document AI, Contract)

Lead ML Engineer (Document AI / NLP, Contract)

Book a

chat

with us

Company